Smoothness parameter of power of Euclidean norm

In this paper, we study derivatives of powers of Euclidean norm. We prove their H\"older continuity and establish explicit expressions for the corresponding constants. We show that these constants are optimal for odd derivatives and at most two times suboptimal for the even ones. In the particular case of integer powers, when the H\"older continuity transforms into the Lipschitz continuity, we improve this result and obtain the optimal constants.


Introduction
Starting from the paper [1], there has been an increasing interest in the cubic regularization of Newton's method (see, for example, [2][3][4][5][6][7][8]), which has some attractive global worst-case complexity guarantees. The main idea of this method is to approximate the objective function with its second-order Taylor approximation, add to it the cube of Euclidean norm with certain coefficient and then minimize the result to obtain a new point.
A natural generalization of this approach consists in considering a general highorder Taylor approximation together with a certain high-order power of Euclidean norm as a regularizer. This leads to tensor methods [9][10][11][12] that have recently gained  Center for Operations Research and Economics, Catholic University of Louvain, Louvain-la-Neuve, Belgium their popularity after it was shown in [13] that one step of the third-order tensor method for minimizing convex functions is comparable with that of the cubic Newton method.
For some applications, involving functions with Hölder continuous derivatives, it may also be reasonable to regularize the models with fractional degrees of the Euclidean norm, as discussed in [14,15].
The efficiency of all the aforementioned methods strongly depends on our possibilities in solving the corresponding auxiliary problems that arise at each iteration. Therefore, it is important to be able to quickly solve minimization problems regularized by powers of Euclidean norm.
Two of the most important characteristics of the objective function that influence the convergence rate of minimization algorithms are the constants of uniform convexity and Hölder continuity of derivatives. It is thus important to know these parameters for powers of Euclidean norm in order to justify the convergence rates of the related minimization algorithms.
The uniform convexity of powers of Euclidean norm was first investigated in [16], where the authors obtained optimal constants for all integer powers. This result was then generalized to arbitrary real powers in [17,Lemma 5]. Thus, the question of uniform convexity is completely solved.
The question of the Hölder continuity of derivatives of powers of Euclidean norm is more subtle. There exist only partial results for some special powers. For example, for any real power between one and two, the Hölder continuity of the first derivative follows from the duality between uniform convexity and Hölder smoothness (see [18,Lemma 1]). For any real power between two and three, the Hölder continuity of the second derivative has recently been proved in [17,Example 2], where some suboptimal constants have been obtained. However, there are currently no general results for an arbitrary power.
Thus, establishing Hölder continuity of derivatives of powers of Euclidean norm and estimating the corresponding constants is still an open problem and constitutes the main topic of this work. This paper is organized as follows. In Sect. 2, we introduce notation and recall important facts on the norm of symmetric multilinear operators.
In Sect. 3, we derive a general formula for derivatives of powers of Euclidean norm (Theorem 3.1). The main object in this formula is a certain family of recursively defined polynomials (Definition 3.1). We give the corresponding definition and provide several examples.
In Sects. 4 and 5, we study these polynomials in more detail. We establish useful identities and prove several important properties such as symmetry (Proposition 4.1), nonnegativity (Proposition 4.3) and monotonicity (Proposition 4.4). Section 5 is devoted to estimating the Hölder constants of the polynomials. The main results in this section are Theorems 5.1 and 5.2.
In Sect. 6, we apply the auxiliary results obtained in the previous sections for proving Hölder continuity of derivatives of powers of Euclidean norm. Namely, in Theorem 6.1, we derive a lower bound for the possible values of Hölder constants. In Theorem 6.2, we prove Hölder continuity of the derivatives along the lines passing through the origin. Finally, in Theorem 6.3, we extend this result onto the whole space and discuss the optimality of the constants.
Finally, in Sect. 7, we show how to improve our general result for integer powers, when the Hölder condition corresponds to the Lipschitz condition.

Notation and Generalities
In this text, E is a finite-dimensional real vector space. Its dual space, composed of all linear functionals on E, is denoted by E * . The value of a linear functional s ∈ E * , evaluated at a point x ∈ E, is denoted by s, x . To introduce a Euclidean norm · on E, we fix a self-adjoint positive definite operator B : E → E * and define x : For a function f : G → R, defined on an open set G in E, and for an integer p ≥ 0, the pth derivative of f , if exists, is denoted by D p f . This derivative is a mapping from G to the space of symmetric p-multilinear forms on E.
Let L be a p-multilinear form on E. Its value, evaluated at h 1 , . . . , h p ∈ E, is denoted by L[h 1 , . . . , h p ]. When h 1 = · · · = h p = h for some h ∈ E, we abbreviate this as L[h] p . The norm of L is defined in the standard way: If the form L is symmetric, it is known that the maximum in the above definition can be achieved when all the vectors are the same: (see, for example, Appendix 1 in [19]). For q ∈ R, by f q : E → R we denote the qth power of the Euclidean norm: The main goal of this paper is to establish that, for any integer p ≥ 0 and any real ν ∈ [0, 1], the pth derivative of f p+ν is ν-Hölder continuous: for all x 1 , x 2 ∈ E, where A p,ν is an explicit constant dependent on p and ν.

Derivatives of Powers of Euclidean Norm
We start with deriving a general formula for derivatives of the function f q . The main objects in this formula are univariate polynomials, defined below.

Definition 3.1
For each integer p ≥ 0 and each q ∈ R, we define a polynomial g p,q : R → R as follows. When p = 0, we set g p,q (τ ) := 1. For all other p ≥ 1, Each polynomial g p,q is a combination of the previous polynomial g p−1,q and its derivative g p−1,q . The first five polynomials can be written explicitly: Let us now describe how derivatives of f q are related to polynomials g p,q . Theorem 3.1 For any real q ∈ R, the function f q is p times differentiable for all integer 0 ≤ p < q. The corresponding derivatives are where h ∈ E is an arbitrary unit vector and Proof Note that f q is infinitely differentiable on E\{0} since its restriction on this set is a composition of two infinitely differentiable functions, namely the quadratic function E\{0} → R : x → x 2 = Bx, x and the power function ]0, +∞[→ R : t → t q/2 . Hence, we only need to prove that f q is also p times differentiable at the origin for any 0 ≤ p < q, and that (2) holds. We proceed by induction. The case p = 0 is trivial since, by definition, the zeroth derivative of a function is the function itself, while g 0,q (τ ) = 1 for any τ ∈ R. Let us assume that p ≥ 1, and the claim is proved for p := p − 1.
First, let us justify (2) for any x ∈ E\{0}. By the induction hypothesis, for all x ∈ E. On differentiating, we obtain that for all x ∈ E\{0}, and hence, where the last equality follows from Definition 3.1. Now let us show that f q is also p times differentiable at the origin with D p f q (0) = 0.
[This is what (2) says when x = 0.] By our inductive assumption, we already know that D p−1 f q (0) = 0. Therefore, according to the definition of derivative, it remains Applying our inductive assumption, we obtain that for all x ∈ E\{0}. Since p < q, we have x q− p → 0 as x → 0. Thus, we need to show that |g p−1,q (τ h (x))| is uniformly bounded for all x ∈ E and all unit h ∈ E. Indeed, by Cauchy-Schwartz inequality, we have |τ h (x)| ≤ 1. Hence, The right-hand side in the above inequality is finite, since a continuous function always achieves its maximum on a compact interval.
Proof Follows from Definition 3.1 using standard rules of differentiation.

Lemma 4.2
For any integer p ≥ 0, and any q, τ ∈ R, Proof We proceed by induction on p. For p = 0, by Definition 3.1, we have (q − p)g p,q (τ ) = q while τ g p,q (τ ) = 0 and qg p,q−2 (τ ) = q, so the claim is obviously true. Now let us prove the claim for p ≥ 1, assuming that it is already true for all integer 0 ≤ p ≤ p − 1. By Definition 3.1, we have Rearranging, we obtain By the induction hypothesis, applied for p := p − 1, we have for all τ ∈ R. Differentiating both sides, we obtain from this that Combining the above three formulas, we see that At the same time, by Lemma 4.1, we have and, by Definition 3.1, we also have Summing the above two identities, we obtain the right-hand side of (6).

Lemma 4.3
For any integer p ≥ 1, and any q, τ ∈ R, Proof Apply Lemma 4.2 to the last term in (5).
The following lemma is particularly interesting. It turns out that, up to a constant factor, the derivative of the polynomial g p,q is exactly the previous polynomial but with a shifted value of q.
Proof We proceed by induction on p. Let τ ∈ R. For p = 1, we know from Definition 3.1 that g p,q (τ ) = qτ , while pqg p−1,q−2 (τ ) = q; therefore, the claim is indeed true. Now let us prove the claim for p ≥ 2, assuming that it is already proved for all integer 0 ≤ p ≤ p − 1. From Lemma 4.3, we already know that Therefore, it remains to prove that By the induction hypothesis for p := p − 1, we already have the identity It remains to verify that But this is given directly by Definition 3.1.
Combined with Definition 3.1, Lemma 4.4 gives us a useful recursive formula for g p,q that does not involve any derivatives.

Lemma 4.5
For any integer p ≥ 2, and any q, τ ∈ R, Lemma 4.5 has several corollaries. The first one gives us closed-form expressions for the values of g p,q at the boundary points of the interval [0, 1].

Proposition 4.2
For any integer p ≥ 0, and any q ∈ R, we have 1 and Proof We proceed by induction on p. From Definition 3.1, we have g 0,q (0) = g 0,q (1) = 1 and g 1,q (0) = 0, g 1,q (1) = q. Thus, the claim is indeed true for p = 0 and p = 1. Now let us prove the claim for p ≥ 2, assuming that it is already true for all integer 0 ≤ p ≤ p − 1. Using Lemma 4.5, we obtain By the induction hypothesis, applied for p := p − 2 (and q := q − 2), we have if p is odd.
By shifting the index in the product, this can be rewritten as Substituting this into (10), we obtain (8).
The second corollary of Lemma 4.5 states that g p,q cannot take negative values on the interval [0, 1], provided that q is sufficiently large.
Combining Proposition 4.3 with Lemma 4.4, we obtain that, when q ≥ p, the polynomial g p,q is not only nonnegative but also monotonically increasing.

Proposition 4.4
For any integer p ≥ 0, and any real q ≥ p, the derivative g p,q is nonnegative on [0, 1]; hence g p,q is monotonically increasing on [0, 1].
Finally, let us show how we can apply the properties that we have established above, to find the maximal absolute value of g p,q on [−1, 1].

Proposition 4.5
For any integer p ≥ 0, and any real q ≥ p,

Hölder Constants of Polynomials
We continue our study of polynomials g p,q , but now we restrict our attention to the particular case when q = p + ν for some real ν ∈ [0, 1].
Clearly, the polynomial g p, p+ν is ν-Hölder continuous on [−1, 1], since this is true for any other polynomial on a compact interval. The goal of this section is to obtain an explicit expression for the corresponding Hölder constant. We start with the result, allowing us to reduce our task to that on [0, 1].
Our next task is to estimate the Hölder constant of g p, p+ν on [0, 1]: Note that Proposition 4.4 allows us to remove the absolute value sign.

Theorem 5.2
For any integer p ≥ 0, and any real ν ∈ [0, 1], we have The proof of Theorem 5.2 is based on two auxiliary propositions.
Let us assume for a moment that these propositions are already proved. Then, the proof of Theorem 5.2 is simple.

But this follows from Proposition 5.2.
Our goal now is to prove Propositions 5.1 and 5.2. We start with Proposition 5.1. It requires three technical lemmas.
Since τ 2 ≥ τ 1 , then Applying the inductive assumption to p := p − 2, we obtain Hence, Thus, to finish the proof, it remains to show that But this is guaranteed by Lemma 5.1.
Proof The function (19) is differentiable with derivative which is non-positive on ]0, τ 2 ] by Lemma 5.2.
Now we can present the proof of Proposition 5.1: Proof Since (14) is differentiable, it suffices to prove that its derivative is nonnegative for all 0 < τ 1 < τ 2 ≤ 1, or, equivalently, that By Lemma 4.2, Therefore, it is enough to prove that or, equivalently, But this immediately follows from Lemma 5.3 using (20).
It remains to prove Proposition 5.2. For this, we need one more lemma.

From this, it follows that
Substituting the above equation into (23), we obtain Proof Since (15) is differentiable, it suffices to prove that its derivative is non-positive for all 0 < τ < 1. By Lemma 4.2, we have Thus, we need to show that or, equivalently (by multiplying both sides by (1 − τ ) 1−ν ), that or, equivalently (by moving the first term into the right-hand side), that But this is given by Lemma 5.4. To conclude this section, let us discuss the optimality of Theorem 5.2.
The corresponding optimal constant, according to Proposition 5.1, is Note that this maximization problem is logarithmically concave in τ . Taking the logarithm and setting the derivative to zero, we find that the maximal point corresponds to τ := ν 2−ν ∈ [0, 1], and the corresponding optimal value is Of course, the last inequality is strict for all 0 ≤ ν < 1.

Hölder Continuity of Derivatives of Powers of Euclidean Norm
We have established the main properties of polynomials g p,q and obtained an explicit upper bound on their Hölder constant. Hence, we are ready to prove the Hölder continuity of derivatives of powers of Euclidean norm. Let us start with a simple result that gives us a lower bound on the Hölder constant.
Proof According to (1), we need to show that for some x 1 , x 2 ∈ E and some unit h ∈ E. Let us choose an arbitrary unit vector h ∈ E, and set x 2 := h. By Theorem 3.1 and Proposition 4.2, To specify x 1 , we consider two cases.
Next we prove Hölder continuity with the optimal constant along any line, passing through the origin. Theorem 6.2 For any integer p ≥ 0, and any real ν ∈ [0, 1], the restriction of D p f p+ν to a line, passing through the origin, is ν-Hölder continuous with constant C p,ν .
Proof Let x 1 , x 2 ∈ E be arbitrary points, lying on a line, passing through the origin, and let h ∈ E be an arbitrary unit vector. According to (1) and Theorem 3.1, we need to show that Observe that this inequality is symmetric in x 1 and x 2 and is invariant when we replace the pair (x 1 , x 2 ) with (−x 1 , −x 2 ). Therefore, we can assume that x 2 ≥ x 1 and τ 2 ≥ 0.
Comparing the result of Theorem 6.3 with the lower bound C p,ν , given by Theorem 6.1, we see that for odd values of p, the constantÃ p,ν is optimal. Unfortunately, this is no longer true for even values of p. Nevertheless, the constantÃ p,ν is still quite accurate. Indeed, since Thus, the constantÃ p,ν is at most two times suboptimal:Ã p,ν ≤ 2C p,ν .
One may think that the reason, why we obtained a suboptimal bound for even values of p, is related to the fact that we had used a suboptimal value for the Hölder constant H p,ν of the polynomial g p, p+ν (see the corresponding discussion at the end of Sect. 5). However, this is not the actual reason. Indeed, let us look what happens when we use the optimal value for H p,ν in the particular case p = 2. Recall that the optimal constant in this case is Substituting this expression into (27), we obtain an improved estimate However, this new estimate is still different from the lower bound C 2,ν = (ν + 1)(ν + 2).
At the same time, for small values of ν, the difference between A p,ν and C p,ν is almost negligible.

Lipschitz Constants of Derivatives of Powers of Euclidean Norm
For even values of p, our estimate A p,ν of the Hölder constant of D p f p+ν was suboptimal. It turns out that in the special case when ν = 1, it is actually very simple to eliminate this drawback and obtain an optimal constant for all values of p. This case corresponds to Lipschitz continuity.

Conclusions
In this work, we have proved that derivatives of powers of Euclidean norm are Hölder continuous and have obtained explicit expressions for the corresponding Hölder constants. We have shown that our constants are optimal for odd derivatives and at most two times suboptimal for the even ones. In the particular case of integer powers, when the Hölder condition corresponds to the Lipschitz condition, we have managed to improve our result and obtained optimal constants in all cases. We believe that in general, it should be possible to obtain optimal constants for even derivatives as well. However, this seems to be a difficult problem.