Let us start from some notation. In what follows, we denote by \({\mathbb {E}}\) a finite-dimensional real vector space and by \({\mathbb {E}}^{*}\) its dual space, which is a space of linear functions on \({\mathbb {E}}\). The value of function \(s \in {\mathbb {E}}^{*}\) at point \(x \in {\mathbb {E}}\) is denoted by \(\langle s, x \rangle \). Let us fix some linear self-adjoint positive-definite operator \(B: {\mathbb {E}}\rightarrow {\mathbb {E}}^{*}\) and introduce the following Euclidean norms in the primal and dual spaces:
$$\begin{aligned} \begin{array}{rcl} \Vert x\Vert:= & {} \langle Bx, x \rangle ^{1/2}, \quad x \in {\mathbb {E}}, \qquad \Vert s\Vert _{*} \; :=\; \langle s, B^{-1}s \rangle ^{1/2}, \quad s \in {\mathbb {E}}^{*}. \end{array} \end{aligned}$$
For any linear operator \(A: {\mathbb {E}}\rightarrow {\mathbb {E}}^{*}\), its norm is induced in a standard way:
$$\begin{aligned} \begin{array}{rcl} \Vert A\Vert:= & {} \max \limits _{x \in {\mathbb {E}}} \bigl \{ \Vert Ax\Vert _{*} \; | \; \Vert x\Vert \le 1 \bigr \}. \end{array} \end{aligned}$$
Our goal is to solve the convex optimization problem in the composite form:
$$\begin{aligned} \begin{array}{rcl} \min \limits _{x \in {\text {dom}}F} F(x):= & {} f(x) + h(x), \end{array} \end{aligned}$$
(1)
where f is a twice differentiable on its open domain uniformly convex function, and h is a simple closed convex function with \({\text {dom}}h \subseteq {\text {dom}}f\). Simple means that all auxiliary subproblems with an explicit presence of h are easily solvable.
For a smooth function f, its gradient at point x is denoted by \(\nabla f(x) \in {\mathbb {E}}^{*}\), and its Hessian is denoted by \(\nabla ^2 f(x) : {\mathbb {E}}\rightarrow {\mathbb {E}}^{*}\). For convex but not necessary differentiable function h, we denote by \(\partial h(x) \subset {\mathbb {E}}^{*}\) its subdifferential at the point \(x \in {\text {dom}}h\).
We say that differentiable function f is uniformly convex of degree \(p \ge 2\) on a convex set \(C \subseteq {\text {dom}}f\) if for some constant \(\sigma > 0\) it satisfies inequality
$$\begin{aligned} \begin{array}{rcl} f(y)&\; \ge \;&f(x) + \langle \nabla f(x), y - x \rangle + \frac{\sigma }{p}\Vert y - x\Vert ^p, \qquad x, y \in C. \end{array} \end{aligned}$$
(2)
Uniformly convex functions of degree \(p = 2\) are known as strongly convex. If inequality (2) holds with \(\sigma = 0\), the function f is called just convex. The following convenient condition is sufficient for function f to be uniformly convex on a convex set \(C \subseteq {\text {dom}}f\):
Lemma 2.1
Lemma 1 in [14]) Let for some \(\sigma > 0\) and \(p \ge 2\) the following inequality holds:
$$\begin{aligned} \begin{array}{rcl} \langle \nabla f(x) - \nabla f(y), x - y \rangle&\; \ge \;&\sigma \Vert x - y\Vert ^p, \qquad x, y \in C. \end{array} \end{aligned}$$
(3)
Then, function f is uniformly convex of degree p on set C with parameter \(\sigma \).
From now on, we assume \( C \; := \; {\text {dom}}F \; \subseteq \; {\text {dom}}f. \) By the composite representation (1), we have for every \(x \in {\text {dom}}F\) and for all \(F'(x) \in \partial F(x)\):
$$\begin{aligned} \begin{array}{rcl} F(y)\ge & {} F(x) + \langle F'(x), y - x \rangle + \frac{\sigma }{p}\Vert x - y\Vert ^p, \qquad y \in {\text {dom}}F. \end{array} \end{aligned}$$
(4)
Therefore, if \(\sigma > 0\), then we can have only one point \(x^{*} \in {\text {dom}}F\) with \(F(x^{*}) = F^{*}\), which always exists for F being uniformly convex and closed. A useful consequence of uniform convexity is the following upper bound for the residual.
Lemma 2.2
Let f be uniformly convex of degree \(p \ge 2\) with constant \(\sigma > 0\) on set \({\text {dom}}F\). Then, for every \(x \in {\text {dom}}F\) and for all \(F'(x) \in \partial F(x)\) we have
$$\begin{aligned} \begin{array}{rcl} F(x) - F^{*}&\; \le \;&\frac{p - 1}{p} \left( \frac{1}{\sigma } \right) ^{\frac{1}{p - 1}} \Vert F'(x)\Vert _{*}^{\frac{p}{p - 1}}. \end{array} \end{aligned}$$
(5)
Proof
In view of (4), bound (5) follows as in the proof of Lemma 3 in [14]. \(\square \)
It is reasonable to define the best possible constant \(\sigma \) in inequality (3) for a certain degree p. This leads us to a system of constants:
$$\begin{aligned} \begin{array}{rcl} \sigma _{\!f}(p)&{} \; :=\; &{} \inf \limits _{\begin{array}{c} x, y \, \in \, {\text {dom}}F \\ x \not = y \end{array}} \frac{\langle \nabla f(x) - \nabla f(y), x - y \rangle }{\Vert x - y\Vert ^p}, \qquad p \ge 2. \end{array} \end{aligned}$$
(6)
We prefer to use inequality (3) for the definition of \(\sigma _{\!f}(p)\), instead of (2), because of its symmetry in x and y. Note that the value \(\sigma _{\!f}(p)\) also depends on the domain of F. However, we omit this dependence in our notation since it is always clear from the context.
It is easy to see that the univariate function \(\sigma _{f}(\cdot )\) is log-concave. Thus, for all \(p_2 > p_1 \ge 2\) we have:
$$\begin{aligned} \begin{array}{rcl} \sigma _{f}(p)&\; \ge \;&\bigl ( \sigma _{f}(p_1) \bigr )^{\frac{p_2 - p}{p_2 - p_1}} \cdot \bigl ( \sigma _{f}(p_2) \bigr )^{\frac{p - p_1}{p_2 - p_1}}, \qquad p \in [p_1, p_2]. \end{array} \end{aligned}$$
(7)
For a twice-differentiable function f, we say that it has Hölder continuous Hessian of degree \(\nu \in [0, 1]\) on a convex set \(C \subseteq {\text {dom}}f\), if for some constant \({\mathcal {H}}\), it holds:
$$\begin{aligned} \begin{array}{rcl} \Vert \nabla ^2 f(x) - \nabla ^2 f(y) \Vert\le & {} {\mathcal {H}} \Vert x - y\Vert ^{\nu }, \qquad x, y \in C. \end{array} \end{aligned}$$
(8)
Two simple consequences of (8) are as follows:
$$\begin{aligned}&\Vert \nabla f(y) - \nabla f(x) - \nabla ^2 f(x)(y - x) \Vert _{*} \le \frac{{\mathcal {H}}\Vert x - y\Vert ^{1 + \nu }}{1 + \nu }, \end{aligned}$$
(9)
$$\begin{aligned}&| f(y) - Q(x; y) | \le \frac{{\mathcal {H}}\Vert x - y\Vert ^{2 + \nu }}{(1 + \nu )(2 + \nu )}, \end{aligned}$$
(10)
where Q(x; y) is the quadratic model of f at the point x:
$$\begin{aligned} \begin{array}{rcl} Q(x; y)&\; :=\;&f(x) + \langle \nabla f(x), y - x \rangle + \frac{1}{2} \langle \nabla ^2 f(x) (y - x), y - x \rangle . \end{array} \end{aligned}$$
In order to characterize the level of smoothness of function f on the set \(C := {\text {dom}}F\), let us define the system of Hölder constants (see [10]):
$$\begin{aligned} \begin{array}{rcl} {\mathcal {H}}_{\!f}(\nu )&{} \; :=&{}\; \sup \limits _{\begin{array}{c} x, y \in {\text {dom}}F \\ x \not = y \end{array}} \frac{\Vert \nabla ^2 f(x) - \nabla ^2 f(y) \Vert }{\Vert x - y\Vert ^{\nu }}, \qquad \nu \in [0, 1]. \end{array} \end{aligned}$$
(11)
We allow \({\mathcal {H}}_{\!f}(\nu )\) to be equal to \(+\infty \) for some \(\nu \). Note that function \({\mathcal {H}}_{f}( \cdot )\) is log-convex. Thus, any \(0 \le \nu _1 < \nu _2 \le 1\) such that \({\mathcal {H}}_{f}(\nu _i) < +\infty , i = 1,2\), provide us with the following upper bounds for the whole interval:
$$\begin{aligned} \begin{array}{rcl} {\mathcal {H}}_{f}(\nu )&\; \le \;&\bigl ( {\mathcal {H}}_{f}(\nu _1) \bigr )^{\frac{\nu _2 - \nu }{\nu _2 - \nu _1}} \cdot \bigl ( {\mathcal {H}}_{f}(\nu _2) \bigr )^{\frac{\nu - \nu _1}{\nu _2 - \nu _1}}, \qquad \nu \in [\nu _1, \nu _2]. \end{array} \end{aligned}$$
(12)
If for some specific \(\nu \in [0, 1]\) we have \({\mathcal {H}}_{\!f}(\nu )= 0\), this implies that \(\nabla ^2 f(x) = \nabla ^2 f(y)\) for all \(x, y \in {\text {dom}}F\). In this case restriction, \(\left. f\right| _{{\text {dom}}F}\) is a quadratic function and we conclude that \({\mathcal {H}}_{\!f}(\nu )= 0\) for all \(\nu \in [0, 1]\). At the same time, having two points \(x, y \in {\text {dom}}F\) with \(0 < \Vert x - y\Vert \le 1\), we get a simple uniform lower bound for all constants \({\mathcal {H}}_{\!f}(\nu )\):
$$\begin{aligned} \begin{array}{rcl} {\mathcal {H}}_{\!f}(\nu )&\; \ge \;&\Vert \nabla ^2 f(x) - \nabla ^2 f(y) \Vert , \qquad \nu \in [0, 1]. \end{array} \end{aligned}$$
Let us give an example of function, which has Hölder continuous Hessian for all \(\nu \in [0, 1]\).
Example 2.1
For a given \(a_i \in {\mathbb {E}}^{*}\), \(1 \le i \le m\), consider the following convex function:
$$\begin{aligned} \begin{array}{rcl} f(x)&\; = \;&\ln \left( \sum \limits _{i = 1}^m e^{\langle a_i, x \rangle } \right) , \quad x \in {\mathbb {E}}. \end{array} \end{aligned}$$
Let us fix Euclidean norm \(\Vert x\Vert = \langle Bx, x \rangle ^{1/2}, x \in {\mathbb {E}}\), with operator \(B := \sum _{i = 1}^m a_i a_i^{*}\). Without loss of generality, we assume that \(B \succ 0\) (otherwise we can reduce dimension of the problem). Then,
$$\begin{aligned} \begin{array}{rcl} {\mathcal {H}}_{\!f}(0)&\; \le \;&1, \quad {\mathcal {H}}_{\!f}(1) \;\, \le \,\; 2. \end{array} \end{aligned}$$
Therefore, by (12) we get, for any \(\nu \in [0, 1]\):
$$\begin{aligned} \begin{array}{rcl} {\mathcal {H}}_{\!f}(\nu )&\; \le \;&2^{\nu }. \end{array} \end{aligned}$$
Proof
Denote \(\kappa (x) \equiv \sum _{i = 1}^m e^{\langle a_i, x \rangle }\). Let us fix arbitrary \(x, y \in {\mathbb {E}}\) and direction \(h \in {\mathbb {E}}\). Then, straightforward computation gives:
$$\begin{aligned} \langle \nabla f(x), h \rangle= & {} \frac{1}{\kappa (x)} \sum _{i = 1}^m e^{\langle a_i, x \rangle } \langle a_i, h \rangle , \\ \langle \nabla ^2 f(x)h, h \rangle= & {} \frac{1}{\kappa (x)} \sum _{i = 1}^m e^{\langle a_i, x \rangle } \langle a_i, h \rangle ^2 - \bigl ( \frac{1}{\kappa (x)} \sum _{i = 1}^m e^{\langle a_i, x \rangle } \langle a_i, h \rangle \bigr )^2 \\= & {} \frac{1}{\kappa (x)} \sum _{i = 1}^m e^{\langle a_i, x \rangle } \left( \langle a_i, h \rangle - \langle \nabla f(x), h \rangle \right) ^2 \; \ge \; 0. \end{aligned}$$
Hence, we get
$$\begin{aligned} \begin{array}{rcl} \Vert \nabla ^2 f(x) \Vert= & {} \max \limits _{\Vert h\Vert \le 1} \langle \nabla ^2 f(x) h, h \rangle \; \le \; \max \limits _{\Vert h\Vert \le 1} \sum _{i = 1}^m \langle a_i, h \rangle ^2 \; = \; \max \limits _{\Vert h\Vert \le 1} \Vert h\Vert ^2 \; = \; 1. \end{array} \end{aligned}$$
Since all Hessians of function f are positive definite, we conclude that \({\mathcal {H}}_{\!f}(0) \le 1\). Inequality \({\mathcal {H}}_{\!f}(1) \le 2\) can be easily obtained from the following representation of the third derivative:
$$\begin{aligned} f'''(x)[h,h,h]= & {} {1 \over \kappa (x)} \sum \limits _{i=1}^m e^{\langle a_i, x \rangle } \left( \langle a_i, h \rangle - \langle \nabla f(x), h \rangle \right) ^3\\\le & {} \langle \nabla ^2 f(x) h, h \rangle \max \limits _{1 \le i,j \le m } \langle a_i - a_j , h \rangle \; \le \; 2 \Vert h \Vert ^3. \end{aligned}$$
\(\square \)
Let us imagine now that we want to describe the iteration complexity of some method, which solves the composite optimization problem (1) up to an absolute accuracy \(\epsilon > 0\) in the function value. We assume that the smooth part f of its objective is uniformly convex and has Hölder continuous Hessians. Which degrees p and \(\nu \) should be used in our analysis? Suppose that, for the number of calls of the oracle, we are interested in obtaining a polynomial-time bound of the form:
$$\begin{aligned} \begin{array}{c} O\left( ({\mathcal {H}}_{\!f}(\nu ))^{\alpha } \cdot (\sigma _{\!f}(p))^{\beta } \cdot \log \frac{F(x_0) - F^{*} \,}{\varepsilon }\right) , \quad \alpha ,\beta \ne 0. \end{array} \end{aligned}$$
Denote by \([x ]\) the physical dimension of variable \(x \in {\mathbb {E}}\), and by \([f ]\) the physical dimension of the value f(x). Then, we have \([\nabla f(x) ]= [f ]/ [x ]\) and \([\nabla ^2f(x) ]= [f ]/ [x ]^2\). This gives us
$$\begin{aligned} \begin{array}{c} [{\mathcal {H}}_{\!f}(\nu )]\; = \; \frac{ [f ]}{ [x ]^{2 + \nu } }, \quad [\sigma _{\!f}(p)]= \frac{[f ]}{[x ]^p}, \quad [\, ({\mathcal {H}}_{\!f}(\nu ))^{\alpha } \cdot (\sigma _{\!f}(p))^{\beta } \, ]= \frac{[f ]^{\alpha + \beta }}{ [x ]^{\alpha (2 + \nu ) + \beta p} }. \end{array} \end{aligned}$$
While x and f(x) can be measured in arbitrary physical quantities, the value “number of iterations” cannot have physical dimension. This leads to the following relations:
$$\begin{aligned} \alpha + \beta = 0 \qquad \text {and} \qquad \alpha (2 + \nu ) + \beta p = 0. \end{aligned}$$
Therefore, despite to the fact that our function can belong to several problem classes simultaneously, from the physical point of view only one option is available:
$$\begin{aligned} \boxed {p = 2 + \nu } \end{aligned}$$
Hence, for a twice-differentiable convex function f with \(\inf _{\nu \in [0, 1]} {\mathcal {H}}_{\!f}(\nu )> 0\), we can define only one meaningful condition number of degree \(\nu \in [0, 1]\):
$$\begin{aligned} \begin{array}{rcl} \gamma _f(\nu ) \; :=\; \frac{\sigma _f(2 + \nu )}{{\mathcal {H}}_{\!f}(\nu )}. \end{array} \end{aligned}$$
(13)
If for some particular \(\nu \) we have \({\mathcal {H}}_{\!f}(\nu )= +\infty \), then by our definition: \(\gamma _f(\nu ) = 0\).
It will be shown that the condition number \(\gamma _f(\nu )\) serves as a main factor in the global iteration complexity bounds for the regularized Newton method as applied to the problem (1). Let us prove that this number cannot be big.
Lemma 2.3
Let \(\inf _{\nu \in [0, 1]} {\mathcal {H}}_{\!f}(\nu )> 0\) and therefore the condition number \(\gamma _f(\cdot )\) be well defined. Then,
$$\begin{aligned} \begin{array}{rcl} \gamma _f(\nu )&\quad \le \quad&\frac{1}{1 + \nu } \;\; + \; \inf \limits _{x, y \in {\text {dom}}F} \frac{\Vert \nabla ^2 f(x) \Vert }{\Vert \nabla ^2 f(y) - \nabla ^2 f(x) \Vert }, \qquad \nu \in [0, 1]. \end{array} \end{aligned}$$
(14)
In the case when \({\text {dom}}F\) is unbounded: \(\sup _{x \in {\text {dom}}F} \Vert x\Vert = +\infty \), then
$$\begin{aligned} \begin{array}{rcl} \gamma _f(\nu )&\quad \le \quad&\frac{1}{1 + \nu }, \qquad \nu \in (0, 1]. \end{array} \end{aligned}$$
(15)
Proof
Indeed, for any \(x, y \in {\text {dom}}F\), \(x \not = y\), we have:
$$\begin{aligned}&\sigma _f(2 + \nu ) \quad \overset{(6)}{\le } \quad \frac{\langle \nabla f(y) - \nabla f(x), y - x \rangle }{\Vert y - x\Vert ^{2 + \nu }} \\&\quad = \quad \frac{\langle \nabla f(y) - \nabla f(x) - \nabla ^2 f(x)(y - x), y - x \rangle }{\Vert y - x\Vert ^{2 + \nu }} \; + \; \frac{\langle \nabla ^2 f(x)(y - x), y - x \rangle }{\Vert y - x\Vert ^{2 + \nu }} \\&\quad \overset{(9)}{\le } \quad \frac{{\mathcal {H}}_{\!f}(\nu )}{1 + \nu } \; + \; \frac{\Vert \nabla ^2 f(x) \Vert }{\Vert y - x\Vert ^{\nu }}. \end{aligned}$$
Now, dividing both sides of this inequality by \({\mathcal {H}}_{\!f}(\nu )\), we get inequality (14) from the definition of \({\mathcal {H}}_{\!f}(\nu )\) (11). Inequality (15) can be obtained by taking the limit \(\Vert y\Vert \rightarrow +\infty \). \(\square \)
From inequalities (7) and (12), we can get the following lower bound:
$$\begin{aligned} \begin{array}{rcl} \gamma _f(\nu )&\; \ge \;&\bigl ( \gamma _f(\nu _1) \bigr )^{\frac{\nu _2 - \nu }{\nu _2 - \nu _1}} \cdot \bigl ( \gamma _f(\nu _2) \bigr )^{\frac{\nu - \nu _1}{\nu _2 - \nu _1}}, \qquad \nu \in [\nu _1, \nu _2], \end{array} \end{aligned}$$
where \(0 \le \nu _1 < \nu _2 \le 1\). However, it turns out that in unbounded case we can have a nonzero condition number \(\gamma _{f}(\nu )\) only for a single degree.
Lemma 2.4
Let \({\text {dom}}F\) be unbounded: \(\sup _{x \in {\text {dom}}F} \Vert x\Vert = +\infty \). Assume that for a fixed \(\nu \in [0, 1]\) we have \(\gamma _f(\nu ) > 0\). Then,
$$\begin{aligned} \gamma _f(\alpha ) = 0 \quad \text {for all} \quad \alpha \in [0, 1] \setminus \{ \nu \}. \end{aligned}$$
Proof
Consider firstly the case: \(\alpha > \nu \). From the condition \(\gamma _f(\nu ) > 0\), we conclude that \({\mathcal {H}}_{\!f}(\nu ) < +\infty \). Then, for any \(x, y \in {\text {dom}}F\) we have:
$$\begin{aligned}&\frac{\sigma _{\!f}(2 + \alpha ) \Vert y - x\Vert ^{2 + \alpha }}{2 + \alpha } \quad \overset{(2)}{\le } \quad f(y) - f(x) - \langle \nabla f(x), y - x \rangle \\&\quad \overset{(10)}{\le } \quad \frac{1}{2}\langle \nabla ^2 f(x)(y - x), (y - x) \rangle + \frac{{\mathcal {H}}_{\!f}(\nu ) \Vert y - x\Vert ^{2 + \nu }}{(1 + \nu )(2 + \nu )}. \end{aligned}$$
Dividing both sides of this inequality by \(\Vert y - x\Vert ^{2 + \alpha }\) and letting \(\Vert x\Vert \rightarrow +\infty \), we get \(\sigma _{\!f}(2 + \nu ) = 0\). Therefore, \(\gamma _f(\alpha ) = 0\). For the second case, \(\alpha < \nu \), we cannot have \(\gamma _f(\alpha ) > 0\), since the previous reasoning results in \(\gamma _f(\nu ) = 0\). \(\square \)
Let us look now at an important example of a uniformly convex function with Hölder continuous Hessian. It is convenient to start with some properties of powers of Euclidean norm.
Lemma 2.5
For fixed real \(p\ge 1\), consider the following function:
$$\begin{aligned} \begin{array}{rcl} f_p(x) \; = \; \frac{1}{p} \Vert x \Vert ^{p}, \quad x \in {\mathbb {E}}. \end{array} \end{aligned}$$
1. For \(p \ge 2\), function \(f_p(\cdot )\) is uniformly convex of degree p:Footnote 1\(^{)}\)
$$\begin{aligned} \langle \nabla f_p(x) - \nabla f_p(y), x - y \rangle \quad \ge \quad 2^{2 - p} \Vert x - y\Vert ^{p}, \quad x, y \in {\mathbb {E}}. \end{aligned}$$
(16)
2. If \(1 \le p \le 2\), then function \(f_p(\cdot )\) has \(\nu \)-Hölder continuous gradient with \(\nu = p-1\):
$$\begin{aligned} \Vert \nabla f_p(x) - \nabla f_p(y) \Vert _* \le 2^{1-\nu } \Vert x - y \Vert ^{\nu }, \quad x, y \in {\mathbb {E}}. \end{aligned}$$
(17)
Proof
Firstly, recall two useful inequalities, which are valid for all \(a, b \ge 0\):
$$\begin{aligned}&|a^{\alpha } - b^{\alpha }| \; \le \; |a - b|^{\alpha }, \quad \text {when} \quad 0 \le \alpha \le 1, \end{aligned}$$
(18)
$$\begin{aligned}&|a^{\alpha } - b^{\alpha }| \; \ge \; |a - b|^{\alpha }, \quad \text {when} \quad \alpha \ge 1. \end{aligned}$$
(19)
Let us fix arbitrary \(x, y \in {\mathbb {E}}\). The left-hand side of inequality (16) equals
$$\begin{aligned} \langle \Vert x\Vert ^{p - 2}Bx - \Vert y\Vert ^{p - 2}By, x - y \rangle \; = \; \Vert x\Vert ^p + \Vert y\Vert ^p - \langle Bx, y \rangle ( \Vert x\Vert ^{p - 2} + \Vert y\Vert ^{p - 2} ), \end{aligned}$$
and we need to verify that it is bigger than \( 2^{2 - p}\bigl [ \Vert x\Vert ^2 + \Vert y\Vert ^2 - 2 \langle Bx, y \rangle \bigr ]^{\frac{p}{2}}. \) The case \(x = 0\) or \(y = 0\) is trivial. Therefore, assume \(x \not = 0\) and \(y \not = 0\). Denoting \(\tau := \frac{\Vert y\Vert }{\Vert x\Vert }\), \(r := \frac{\langle Bx, y\rangle }{\Vert x\Vert \cdot \Vert y\Vert }\), we have the following statement to prove:
$$\begin{aligned} \begin{array}{rcl} 1 + \tau ^p\ge & {} r \tau (1 + \tau ^{p - 2}) + 2^{2 - p} \bigl [ 1 + \tau ^2 - 2 r \tau \bigr ]^{\frac{p}{2}} , \quad \tau > 0, \quad |r| \le 1. \end{array} \end{aligned}$$
Since the function in the right-hand side is convex in r, we need to check only two marginal cases:
-
1.
\(r = 1 \, : \quad \) \(1 + \tau ^{p} \; \ge \; \tau (1 + \tau ^{p - 2}) + 2^{2 - p} |1 - \tau |^p\), which is equivalent to \((1 - \tau ) (1 - \tau ^{p - 1}) \ge 2^{2 - p}|1 - \tau |^p\). This is true by (19).
-
2.
\(r = -1\, : \quad \) \(1 + \tau ^{p} \; \ge \; -\tau (1 + \tau ^{p - 2}) + 2^{2 - p}(1 + \tau )^p\), which is equivalent to \((1 + \tau ^{p - 1}) \ge 2^{2 - p}(1 + \tau )^{p - 1} \). This is true in view of convexity of function \(\tau ^{p-1}\) for \(\tau \ge 0\).
Thus, we have proved (16). Let us prove the second statement. Consider the function \({\hat{f}}_q(s) = {1 \over q} \Vert s \Vert ^q_*\), \(s \in {\mathbb {E}}^*\), with \(q = {p \over p-1} \ge 2\). In view of our first statement, we have:
$$\begin{aligned} \begin{array}{rcl} \langle s_1 - s_2, \nabla {\hat{f}}_q(s_1) - \nabla {\hat{f}}_q(s_2) \rangle\ge & {} \left( {1 \over 2}\right) ^{q-2} \Vert s_1 - s_2 \Vert _*^q, \quad s_1, s_2 \in {\mathbb {E}}^*. \end{array} \end{aligned}$$
(20)
For arbitrary \(x_1, x_2 \in {\mathbb {E}}\), define \(s_i = \nabla f_p(x_i) = {B x_i \over \Vert x_i \Vert ^{2-p}} \), \(i = 1, 2\). Then \(\Vert s_i \Vert _* = \Vert x_i \Vert ^{p-1}\), and consequently,
$$\begin{aligned} \begin{array}{rcl} x_i= & {} \Vert x_i \Vert ^{2-p} B^{-1} s_i \; = \; \Vert s_i \Vert _{*}^{2-p \over p-1} B^{-1} s_i \; = \; \nabla {\hat{f}}_q(s_i). \end{array} \end{aligned}$$
Therefore, substituting these vectors in (20), we get
$$\begin{aligned} \left( {1 \over 2}\right) ^{q-2} \Vert \nabla f_p(x_1) - \nabla f_p(x_2) \Vert _*^q \le \langle \nabla f_p(x_1) - \nabla f_p(x_2), x_1 - x_2 \rangle . \end{aligned}$$
Thus, \(\Vert \nabla f_p(x_1) - \nabla f_p(x_2) \Vert _* \le 2^{q-2 \over q-1} \Vert x_1 - x_2 \Vert ^{1 \over q-1}\). It remains to note that \({1 \over q-1} = p-1 = \nu \). \(\square \)
Example 2.2
For real \(p \ge 2\) and arbitrary \(x_0 \in {\mathbb {E}}\), consider the following function:
$$\begin{aligned} \begin{array}{rcl} f(x) \; = \; \frac{1}{p} \Vert x - x_0\Vert ^{p} \; = \; f_p(x - x_0), \quad x \in {\mathbb {E}}. \end{array} \end{aligned}$$
Then, \(\sigma _{\!f}(p) \; = \; \left( \frac{1}{2} \right) ^{p - 2}\). Moreover, if \(p = 2 + \nu \) for some \(\nu \in (0, 1]\), then it holds
$$\begin{aligned} \begin{array}{rcl} {\mathcal {H}}_{\!f}(\nu )\; \le \; (1 + \nu )2^{1 - \nu }, \end{array} \end{aligned}$$
and \({\mathcal {H}}_{\!f}(\alpha ) \; = \; +\infty \), for all \(\alpha \in [0, 1] \setminus \{\nu \}\). Therefore, in this case we have \( \gamma _{f}(\nu )\; \ge \; \frac{1}{2(1 + \nu )}, \) and \(\gamma _{f}(\alpha ) = 0\) for all \(\alpha \in [0, 1] \setminus \{\nu \}\).
Proof
Let us take an arbitrary \(x \ne 0\) and set \(y := -x\). Then,
$$\begin{aligned} \langle \nabla f(x) - \nabla f(y), y - x \rangle \; = \; \langle \Vert x\Vert ^{p - 2} Bx + \Vert x\Vert ^{p - 2} Bx, 2 x \rangle \; = \; 4 \Vert x\Vert ^{p}. \end{aligned}$$
On the other hand, \(\Vert y - x \Vert ^p = 2^p \Vert x \Vert ^p\). Therefore, \(\sigma _{\!f}(p) {\mathop {\le }\limits ^{(6)}} 2^{2-p}\), and (16) tells us that this inequality is satisfied as equality.
Let us prove now that \({\mathcal {H}}_{\!f}(\nu )\le (1 + \nu )2^{1 - \nu }\) for \(p = 2 + \nu \) with some \(\nu \in (0, 1]\). This is
$$\begin{aligned} \Vert \nabla ^2 f(x) - \nabla ^2 f(y) \Vert \; \le \; (1 + \nu ) 2^{1 - \nu } \Vert x - y\Vert ^{\nu }, \quad x, y \in {\mathbb {E}}. \end{aligned}$$
(21)
The corresponding Hessians can be represented as follows:
$$\begin{aligned} \begin{array}{rcl} \nabla ^2 f(x)= & {} \Vert x\Vert ^{\nu } B + \frac{\nu B x x^{*} B}{\Vert x\Vert ^{2 - \nu }}, \quad x \in {\mathbb {E}}\setminus \{0\}, \qquad \nabla ^2 f(0) = 0. \end{array} \end{aligned}$$
For the case \(x = y = 0\), inequality (21) is trivial. Assume now that \(x \not = 0\). If \(0 \in [x, y]\), then \(y = -\beta x\) for some \(\beta \ge 0\) and we have:
$$\begin{aligned} \Vert \nabla ^2 f(x) - \nabla ^2 f(-\beta x) \Vert\le & {} |1 - \beta ^\nu | (1 + \nu ) \Vert x\Vert ^{\nu } \; \le \; (1 + \beta )^{\nu } (1 + \nu ) 2^{1 - \nu } \Vert x\Vert ^{\nu } \\= & {} (1 + \nu ) 2^{1 - \nu } \Vert x - y\Vert ^{\nu }, \end{aligned}$$
which is (21). Let \(0 \notin [x, y]\). For an arbitrary fixed direction \(h \in {\mathbb {E}}\), we get:
$$\begin{aligned} \begin{array}{rcl} \bigl | \bigl \langle (\nabla ^2 f(x) - \nabla ^2 f(y)) h, h \bigr \rangle \bigr |= & {} \Bigl | \left( \Vert x\Vert ^{\nu } - \Vert y\Vert ^{\nu } \right) \cdot \Vert h\Vert ^2 + \nu \cdot \left( \frac{\langle Bx, h \rangle ^2}{\Vert x\Vert ^{2 - \nu }} - \frac{\langle By, h \rangle ^2}{\Vert y\Vert ^{2 - \nu }} \right) \Bigr |. \end{array} \end{aligned}$$
Consider the points \(u = \frac{Bx}{\Vert x\Vert ^{1 - \nu }} = \nabla f_q(x)\) and \(v = \frac{By}{\Vert y\Vert ^{1 - \nu }} = \nabla f_q(y)\) with \(q = 1+\nu \). Then,
$$\begin{aligned} \begin{array}{c} \Vert x\Vert ^{\nu } = \Vert u\Vert _*, \quad \frac{\langle B x, h \rangle ^2}{\Vert x\Vert ^{2 - \nu }} = \frac{\langle u, h \rangle ^2}{\Vert u\Vert _*} \quad \text {and} \quad \Vert y\Vert ^{\nu } = \Vert v\Vert _*, \quad \frac{\langle By, h \rangle ^2}{\Vert y\Vert ^{2 - \nu }} = \frac{\langle v, h \rangle ^2}{\Vert v \Vert _*}. \end{array} \end{aligned}$$
Therefore,
$$\begin{aligned}&\bigl | \bigl \langle (\nabla ^2 f(x) - \nabla ^2 f(y)) h, h \bigr \rangle \bigr | \nonumber \\&\quad = \Bigl | \left( \Vert u\Vert _* - \Vert v\Vert _* \right) \cdot \Vert h\Vert ^2 \; + \; \nu \cdot \left( \frac{\langle u, h \rangle ^2}{\Vert u\Vert _*} - \frac{\langle v, h \rangle ^2}{\Vert v\Vert _*} \right) \Bigr |. \end{aligned}$$
(22)
Let us estimate the right-hand side of (22) from above. Consider a continuously differentiable univariate function:
$$\begin{aligned} \phi (\tau ):= & {} \Vert u(\tau )\Vert _* \cdot \Vert h\Vert ^2 + \nu \cdot \frac{\langle u(\tau ), h \rangle ^2}{\Vert u(\tau )\Vert _*}, \\ u(\tau ):= & {} u + \tau (v - u), \quad \tau \in [0, 1]. \end{aligned}$$
Note that
$$\begin{aligned} \phi ^{\prime }(\tau )= & {} \frac{\langle u(\tau ), B^{-1}(v-u)\rangle }{\Vert u(\tau )\Vert _*} \cdot \Vert h\Vert ^2 + \frac{2 \nu \langle u(\tau ), h \rangle \langle v-u, h \rangle }{\Vert u(\tau )\Vert _*} \;\\&- \; \frac{\nu \langle u(\tau ), h \rangle ^2 \langle u(\tau ), B^{-1}(v-u) \rangle }{\Vert u(\tau )\Vert _*^3}\\= & {} \frac{\langle u(\tau ), B^{-1}(v-u) \rangle }{\Vert u(\tau )\Vert _*} \cdot \underbrace{\left( \Vert h\Vert ^2 - \tfrac{\nu \langle u(\tau ), h \rangle ^2}{\Vert u(\tau )\Vert _*^2} \right) }_{\ge 0} \; + \; \frac{2\nu \langle u(\tau ), h \rangle \langle v-u, h\rangle }{\Vert u(\tau ) \Vert _*}. \end{aligned}$$
Denote \(\gamma := \frac{\langle u(\tau ), h \rangle }{\Vert u(\tau )\Vert _* \cdot \Vert h\Vert } \in [-1, 1]\). Then,
$$\begin{aligned} \bigl | \phi ^{\prime }(\tau ) \bigr | \; \le \; \Vert v - u\Vert _* \cdot \Vert h\Vert ^2 \cdot \bigl (1 - \nu \gamma ^2 + 2\nu |\gamma | \bigr ) \; \le \; (1 + \nu ) \cdot \Vert v-u\Vert _* \cdot \Vert h\Vert ^2. \end{aligned}$$
Thus, we have:
$$\begin{aligned} \bigl | \bigl \langle (\nabla ^2 f(x) - \nabla ^2 f(y)) h, h \bigr \rangle \bigr | \; = \; | \phi (1) - \phi (0) | \; \le \; (1 + \nu ) \cdot \Vert v - u\Vert _* \cdot \Vert h\Vert ^2. \end{aligned}$$
(23)
It remains to use the definition of u and v and apply inequality (17) with \(p=q\). Thus, we have proved, that for \(p = 2 + \nu \) the Hessian of f is Hölder continuous of degree \(\nu \). At the same time, taking \(y = 0\), we get \(\Vert \nabla ^2 f(x) - \nabla ^2 f(y) \Vert = \Vert \nabla ^2 f(x) \Vert = (1 + \nu )\Vert x\Vert ^{\nu }\). These values cannot be uniformly bounded in \(x \in {\mathbb {E}}\) by any multiple of \(\Vert x\Vert ^{\alpha }\) with \(\alpha \ne \nu \). So, the Hessian of f is not Hölder continuous for any degree different from \(2+\nu \). \(\square \)
Remark 2.1
Inequalities (16) and (17) have the following symmetric consequences:
$$\begin{aligned} p \ge 2\Rightarrow & {} \Vert \nabla f_p(x) - \nabla f_p(y) \Vert _* \; \ge \; 2^{2-p} \Vert x - y \Vert ^{p-1}, \\ p \le 2\Rightarrow & {} \Vert \nabla f_p(x) - \nabla f_p(y) \Vert _* \; \le \; 2^{2-p} \Vert x - y \Vert ^{p-1}, \end{aligned}$$
which are valid for all \(x, y \in {\mathbb {E}}\).