An extension of the proximal point algorithm beyond convexity

We introduce and investigate a new generalized convexity notion for functions called prox-convexity. The proximity operator of such a function is single-valued and firmly nonexpansive. We provide examples of (strongly) quasiconvex, weakly convex, and DC (difference of convex) functions that are prox-convex, however none of these classes fully contains the one of prox-convex functions or is included into it. We show that the classical proximal point algorithm remains convergent when the convexity of the proper lower semicontinuous function to be minimized is relaxed to prox-convexity.


Introduction
The first motivation behind this study comes from works like [12,19,22,23] where proximal point type methods for minimizing quasiconvex functions formulated by means of Bregman distances were proposed.On the other hand, other extensions of the proximal point algorithm for nonconvex optimization problems (such as the ones introduced in [10,18,20,24]) cannot be employed in such situations due to various reasons.Looking for a way to reconcile these approaches we came across a new class of generalized convex functions that we called prox-convex, whose properties allowed us to extend the convergence of the classical proximal point algorithm beyond the convexity setting into a yet unexplored direction.
In contrast to other similar generalizations, the proximity operators of the proper prox-convex functions are single-valued (and firmly nonexpansive) on the underlying sets.To the best of our knowledge besides the convex and prox-convex functions only the weakly convex ones have single-valued proximity operators (cf.[16]).This property plays a crucial role in the construction of proximal point type algorithms as the new iterate is thus uniquely determined and does not have to be picked from a set.Moreover, the prox-convexity of the functions can be considered both globally or on a subset of their domains, that can be of advantage when dealing with concrete applications from practice.Various functions, among which several families of (strongly) quasiconvex, weakly and DC (i.e.difference of convex) ones, fulfill the definition of the new notion we propose.As a byproduct of our study we also deliver new results involving (strongly) quasiconvex functions.
Different to other extensions of the proximal point algorithm, the one we propose has a sort of a local nature, however not in the sense of properties of a function that hold in some neighborhoods, but concerning the restriction of the function to a (convex) set.We are not aware of very similar work in the literature where the proximity operator of a function is taken with respect to a given set, however in works like [6,13] such constructions with some employed functions not split from the corresponding sets were already considered.
Given a proper, lower semicontinuous and convex function h : R n → R := R ∪ {±∞}, for any z ∈ R n the minimization problem has (even in more general frameworks such as Hilbert spaces) a unique optimal solution denoted by Prox h (z), that is the value of the proximity operator of the function h at the point z.A fundamental property of the latter is when z, x ∈ R n (see, for instance, [5,Proposition 12.26]) where ∂h is the usual convex subdifferential.These two facts (the existence of an optimal solution to (1.1) and the characterization (1.2)) are crucial tools for proving the convergence of the proximal point type algorithms for continuous optimization problems consisting in minimizing (sums of) proper, lower semicontinuous and convex functions, and even for DC programming problems (see [4] for instance).For the class of prox-convex functions introduced in this article the first of them holds while the second one is replaced by a weaker variant and we show that these properties still guarantee the convergence of the sequence generated by the proximal point algorithm towards a minimum of a prox-convex function.
The paper is constructed as follows.After some preliminaries, where we define the framework and recall some necessary notions and results, we introduce and investigate the new classes of prox-convex functions and strongly Gsubdifferentiable functions, showing that the proper and lower semicontinuous elements of the latter belong to the first one, too.Finally, we show that the classical proximal point algorithm can be extended to the prox-convex setting without losing the convergence.

Preliminaries
By •, • we mean the inner product of R n and by • the Euclidean norm on R n .Let K be a nonempty set in R n and we denote its topological interior by int K and its boundary by bd K.The indicator function of K is defined by δ K (x) := 0 if x ∈ K, and δ K (x) := +∞ elsewhere.By B(x, δ) we mean the closed ball with center at x ∈ R n and radius δ > 0. By Id : R n → R n we denote the identity mapping on R n .
Given any x, y, z ∈ R n , we have For any x, y ∈ R n and any β ∈ R, we have Given any extended-valued function h : R n → R := R ∪ {±∞}, the effective domain of h is defined by dom h := {x ∈ R n : h(x) < +∞}.We say that h is proper if dom h is nonempty and h(x) > −∞ for all x ∈ R n .
We denote by epi < λ} the sublevel (respectively strict sublevel ) set of h at the height λ ∈ R, and by arg min R n h the set of all minimal points of h.We say that a function is L-Lipschitz when it is Lipschitz continuous with constant L > 0. We adopt the usual convention sup ∅ h := −∞ and inf ∅ h := +∞.
A function h with a convex domain is said to be (a) convex if, given any x, y ∈ dom h, then We say that h is strictly quasiconvex if the inequality in (2. For algorithmic purposes, the following notions from [5,Definition 10.27] (see also [29,30]) are useful.
A function h with a convex domain is said to be strongly convex (respectively strongly quasiconvex ), if there exists β ∈]0, +∞[ such that for all x, y ∈ dom h and all λ ∈ [0, 1], we have respectively h(λy ( For (2.7), sometimes one needs to restrict the value β to a subset J in ]0, +∞[ and then h is said to be strongly quasiconvex for J.
Every strongly convex function is strongly quasiconvex, and every strongly quasiconvex function is semistrictly quasiconvex.Furthermore, a strongly quasiconvex function has at most one minimizer on a convex set K ⊆ R n that touches its domain (see [5,Proposition 11.8] when x ∈ dom h, and ∂h(x) = ∅ if x ∈ dom h.But in case of nonconvex functions (quasiconvex for instance) the convex subdifferential is too small and often empty, other subdifferential notions (see [14,25]) being necessary, like the Gutiérrez subdifferential (of h at x), defined by when x ∈ dom h, and when x ∈ dom h, and The reverse inclusions do not hold as the function h : R → R given by h(x) = min{x, max{x − 1, 0}} shows (see [26, page 21]).A sufficient condition for equality in this inclusion chain is given in [26, Proposition 10].Note that both ∂ ≤ h and ∂ < h are (at any point) either empty or unbounded, and it holds (see [14,25,26]) We recall the following results originally given in [25,Theorem 2.3], [31, Proposition 2.5 and Proposition 2.6] and [9,Theorem 20], respectively.Lemma 2.1.Let h : R n → R be a proper function.The following results hold.
For γ > 0 we define the Moreau envelope of parameter γ of h by The proximity operator of parameter γ > 0 of a function h : R n → R at x ∈ R n is defined as (2.17) When h is proper, convex and lower semicontinuous, Prox γh turns out to be a single-valued operator.By a slight abuse of notation, when Prox γh is singlevalued we write in this paper Prox γh (z) (for some z ∈ R n ) to identify the unique element of the actual set Prox γh (z).Moreover, when γ = 1 we write Prox h instead of Prox 1h .
For studying constrained optimization problems, the use of constrained notions becomes important since they ask for weaker conditions.Indeed, for instance, the function h : R → R given by h , we mean the convex, Gutiérrez and Plastria subdifferentials of h at x ∈ K restricted to the set K, that is, (2.18) (b) firmly nonexpansive if for every x, y ∈ K, we have According to [5,Proposition 4.4], T is firmly nonexpansive if and only if As a consequence, if T is firmly nonexpansive, then T is Lipschitz continuous and monotone.

Prox-convex functions
In this section, we introduce and study a class of functions for which the necessary fundamental properties presented in the introduction are satisfied.

Motivation, definition and basic properties
We begin with the following result, in which we provide a general sufficient condition for the nonemptiness of the values of the proximity operator.
Proposition 3.1.Let h : R n → R be a proper, lower semicontinuous and 2-weakly coercive function.Given any z ∈ R n , there exists x ∈ Prox h (z).
Proof.Given z ∈ R n , we consider the minimization problem: Since h is lower semicontinuous and 2-weakly coercive, h z is lower semicontinuous and coercive by [8, Theorem 2(ii)].Thus, there exists x ∈ R n such that x ∈ arg min R n h z , i.e., x ∈ Prox h (z).
One cannot weaken the assumptions of Proposition 3.1 without losing its conclusion.
Remark 3.1.(i) Note that every convex function is 2-weakly coercive, and every bounded from below function is also 2-weakly coercive.The function h : R n → R given by h(x) = −|x| is 2-weakly coercive, but is neither convex nor bounded from below.However, for any (ii) The 2-weak coercivity assumption can not be dropped in the general case.Indeed, the function h : R → R given by h(x) = −x 3 is continuous and quasiconvex, but fails to be 2-weakly coercive and for any z ∈ R one has Prox h (z) = ∅.
Next we characterize the existence of solution in the definition of the proximity operator.Proposition 3.2.Let h : R n → R be a proper function.Given any z ∈ R n , one has Relation (3.2) is too general for providing convergence results for proximal point type algorithms while relation (1.2) has proven to be extremely useful in the convex case.Motivated by this, we introduce the class of prox-convex functions below.In the following, we write Prox h (K, z) := Prox (h+δK ) (z). (3.3) Note that closed formulae for the proximity operator of a sum of functions in terms of the proximity operators of the involved functions are known only in the convex case and under demanding hypotheses, see, for instance, [1].However, constructions like the one in (3.3) can be found in the literature on proximal point methods for solving different classes of (nonconvex) optimization problems, take for instance [6,13].Definition 3.1.Let K be a closed set in R n and h : R n → R be a proper function such that K ∩ dom h = ∅.We say that h is prox-convex on K if there exists α > 0 such that for every z ∈ K, Prox h (K, z) = ∅, and The set of all prox-convex function on K is denoted by Φ(K), and the scalar α > 0 for which (3.4) holds is said to be the prox-convex value of the function h on K.When K = R n we say that h is prox-convex.
Remark 3.2.(i) One can immediately notice that (3.4) is equivalent to a weaker version of (1.2), namely (ii) The scalar α > 0 for which (3.4) holds needs not be unique.Indeed, if h is convex, then α = 1 by Proposition 3.4.However, due to the convexity of h, x − z, x − x ≥ 0. Hence, x ∈ Prox h (K, z) implies that Note however that a similar result does not necessarily hold in general, as x − z, x − x might be negative.
(iii) Note also that, at least from the computational point of view, an exact value of α needs not be known, as one can see in Section 4.
In the following statement we see that in the left-hand side of (3.4) one can replace the element-of symbol with equality since the proximity operator of a proper prox-convex function is single-valued and also firmly nonexpansive.
Proposition 3.3.Let K be a closed set in R n and h : R n → R a proper proxconvex function on K such that K ∩dom h = ∅.Then the map z → Prox h (K, z) is single-valued and firmly nonexpansive.
Proof.Suppose that h is a prox-convex function with prox-convex value α > 0 and assume that for some z ∈ K one has Take x = x 2 in (3.5) and x = x 1 in (3.6).By adding the resulting equations, we get 0 Hence, Taking x = x 2 in (3.7) and x = x 1 in (3.8) and adding them, we have Hence, by [5, Proposition 4.4], Prox h (K, •) is firmly nonexpansive.
Next we show that every lower semicontinuous and convex function is proxconvex.
Proposition 3.4.Let K be a closed and convex set in R n and h : R n → R be a proper and lower semicontinuous function such that Proof.Since h is convex, the function x → h(x) + (β/2) z − x 2 is strongly convex on K for all β > 0 and all z ∈ K, in particular, for β = 1.Thus Prox h (K, z) contains exactly one element, say x ∈ R n .It follows from [5,Proposition 12.26] that z − x ∈ ∂h(x), so relation (3.4) holds for α = 1.Therefore, h ∈ Φ(K).
Prox-convexity goes beyond convexity as shown below.
In order to formulate a reverse statement of Proposition 3.4, we note that if h : R n → R is a lower semicontinuous and prox-convex function on some set K ∩ dom h = ∅ which satisfies (3.4) for α = 1, then h is not necessarily convex.Indeed, the function in Example 3.1 satisfies (3.4) for all α > 0, but it is not convex on K = [0, 1].
In the following example, we show that lower semicontinuity is not a necessary condition for prox-convexity.Note also that although the proximity operator of the function mentioned in Remark 3.1(ii) is always empty, this is no longer the case when restricting it to an interval.Example 3.2.Take n ≥ 3, K n := [1, n] and the function h n : K n → R given by Note that h n is neither convex nor lower semicontinuous, but it is quasiconvex on K n .Due to the discontinuity of h n , the function f n (x) = h n (x)+(1/2) x 2 is neither convex nor lower semicontinuous on K n , hence h n is not c-weakly convex (in the sense of [17]) either and also its subdifferential is not hypomonotone (as defined in [10,18,24]).However, for any z ∈ K n , Prox hn (K n , z) = {n}, and Another example of a prox-convex function that is actually (like the one in Example 3.1) both concave and DC follows.

Remark 3.3. (i) One can also construct examples of c-weakly convex func-
tions (for some c > 0) that are not prox-convex, hence these two classes only contain some common elements without one of them being completely contained in the other.
(ii) While Examples 3.1 and 3.3 exhibit prox-convex functions that are also DC, the prox-convex functions presented in Example 3.2 are not DC.Examples of DC functions that are not prox-convex can be constructed as well, consequently, like in the case of c-weakly convex functions, these two classes only contain some common elements without one of them being completely contained in the other.Note moreover that different to the literature on algorithms for DC optimization problems (see, for instance, [2,4]) where usually only critical points (and not optimal solutions) of such problems are determinable, for the DC functions that are also proxconvex proximal point methods are capable of delivering global minima (on the considered sets).
(iii) The remarkable properties of the Kurdyka-Lojasiewicz (K L) functions made them a standard tool when discussing proximal point type algorithms for nonconvex functions.As their definition requires proper closedness and the prox-convex functions presented in Example 3.2 are not closed, one can conclude that the class of prox-convex functions is broader in this sense than the one of K L ones.Similarly one can note that prox-convexity is not directly related to hypomonotonicity of subdifferentials (see [10,18,24], respectively).
(iv) At least due to the similar name, a legitimate question is whether the notion of prox-convexity is connected in any way with the prox-regularity (cf.[10,20,24]).While the latter asks a function to be locally lower semicontinuous around a given point, the notion we introduce in this work does not assume any topological properties on the involved function.Another difference with respect to this notion can be noticed in Section 4, where we show that the classical proximal point algorithm remains convergent towards a minimum of the function to be minimized even if this lacks convexity, but is prox-convex.On the other hand, the iterates of the modified versions of the proximal point method employed for minimizing prox-regular functions converge towards critical points of the latter.Last but not least note that, while in the mentioned works one uses tools specific to nonsmooth analysis such as generalized subdifferentials, in this paper we employ the convex subdifferential and some subdifferential notions specific to quasiconvex functions.
Necessary and sufficient hypotheses for condition (3.4) are given below.
Proposition 3.5.Let K be a closed set in R n and h : R n → R be a proper, lower semicontinuous and prox-convex function such that K ∩ dom h = ∅.Let α > 0 be the prox-convex value of h on K, and z ∈ K. Consider the following assertions  If h is prox-convex with prox-convex value α, then we know that Prox (1/α)h = Prox h is a singleton, hence Consequently, 1/α h(z) ∈ R for all z ∈ R n .Furthermore, we have the following statements.Proposition 3.6.Let h : R n → R be proper, lower semicontinuous and proxconvex with prox-convex value α > 0 on a closed set K ⊆ R n such that K ∩ dom h = ∅.Then 1/α h : R n → R is Fréchet differentiable everywhere and ) From (2.17), we get Exchanging above x with y and x with y, one gets It follows from equations (3.12) and (3.13) that Thus, 1/α h is Fréchet differentiable at every x ∈ R n , and

Strongly G-subdifferentiable functions
Further we introduce and study a class of quasiconvex functions whose lower semicontinuous members are prox-convex.Definition 3.2.Let K be a closed and convex set in R n and h : R n → R be a proper and lower semicontinuous function such that Next we show that a lower semicontinuous and strongly G-subdifferentiable function on K is prox-convex.Proposition 3.7.Let K be a closed and convex set in R n and h : R n → R be a proper and lower semicontinuous function such that Proof.Let h be a lower semicontinuous and strongly G-subdifferentiable function.Then for every z ∈ K, there exists x ∈ K with x = Prox h (K, z).Hence, given any y ∈ K, we take y λ = λy + (1 − λ)x with λ ∈ [0, 1].Thus, by the definition of the proximity operator and the strong quasiconvexity of h on K for some β ≥ 1, we have We have two possible cases.
Remark 3.5.(i) When h : R n → R is lower semicontinuous and strongly quasiconvex, as strongly quasiconvex functions are semistrictly quasiconvex, h is quasiconvex and every local minimum of h is a global minimum, too, so h is neatly quasiconvex, i.e., ∂ < h = ∂ ≤ h (see [26,Proposition 9]).Therefore, we can replace ∂ ≤ K h by ∂ < K h in condition (3.14).
(ii) Strongly G-subdifferentiable functions are not necessarily convex as the function in Example 3.1 shows.
A family of prox-convex functions that are not strongly G-subdifferentiable can be found in Remark 3.6, see also Example 3.2.Now, we study lower semicontinuous strongly quasiconvex functions for which the Gutierréz subdifferential is nonempty.To that end, we first recall the following definitions (adapted after [11,Definition 3.1]).Definition 3.3.Let K be a nonempty set in R n and h : R n → R with K ∩ dom h = ∅.We say that h is such that (c) positively quasiconvex on K if for any x there exists α(x) > 0 such that h is α(x)-quasiconvex on S h(x) (h).
The following result presents a connection between strongly quasiconvex functions and positively quasiconvex ones.Proposition 3.8.Let h : R n → R be a strongly quasiconvex function, x ∈ R n and α > 0. Then the following assertions hold As a consequence, in both cases, h is positively quasiconvex on R n .
Proof.The proofs are similar, so we only show (a).Take x ∈ R n and ξ ∈ ∂ ((1/α)h) (x).Then, Take y ∈ S h(x) (h) and z = λy Then, for every y ∈ S h(x) (h), by dividing by λ > 0 and taking the limit when λ descends towards 0, we have Now, since h is strongly quasiconvex, arg min R n h has at most one point.If x ∈ arg min R n h, then condition (3.15) holds immediately.If x ∈ arg min R n h, then ξ = 0, i.e., condition (3.15) holds for β/(2α ξ ) > 0.
Therefore, h is positively quasiconvex on R n .
As a consequence, we have the following result.
Corollary 3.1.Let h : R n → R be a lower semicontinuous and strongly quasiconvex function with β = 1, let z ∈ R n and x ∈ Prox h (z).If there exists then h is prox-convex on its sublevel set at the height h(x), i.e., h ∈ Φ(S h(x) (h)).
Proof.If ξ ∈ ∂ ≤ h(x), and since h is lower semicontinuous and strongly quasiconvex with β = 1, then by Proposition 3.8(b), we have Another consequence is the following sufficient condition for inf-compactness under an L-Lipschitz assumption, which revisits [29, Corollary 1].Corollary 3.2.Let h : R n → R be an L-Lipschitz and strongly quasiconvex function.Then h is inf-compact on R n .
Proof.If h is strongly quasiconvex, then h is neatly quasiconvex, and since h is L-Lipschitz, ∂ ≤ h(x) = ∅ for all x ∈ R n by Lemma 2.1(b).Now, by Proposition 3.8(b), it follows that h is positively quasiconvex on R n .Finally, h is inf-compact on R n by [11,Corollary 3.6].
We finish this section with the following observation.Remark 3.6.There are (classes of ) prox-convex functions which are neither convex nor strongly quasiconvex.Indeed, for all n ∈ N, we take K n := [−n, +∞[ and the continuous quasiconvex functions h n : K n → R given by h n (x) = x 3 .Clearly, h n is neither convex nor strongly quasiconvex on K n hence also not strongly G-subdifferentiable either.
Take n ∈ N. Then for all z ∈ K n , arg min Kn h n = Prox hn (z) = {−n}, thus S hn(x) (h n ) = {x}, i.e., ∂ ≤ Kn h n (x) = R n .Therefore, h n ∈ Φ(K n ) for all n ∈ N. Taking also into consideration Corollary 3.1 one can conclude that the classes of strongly quasiconvex and prox-convex functions intersect without being included in one another.Remark 3.7.All the prox-convex functions we have identified so far are semistrictly quasiconvex, too, while there are semistrictly quasiconvex functions that are not prox-convex (for instance h : R → R defined by h(x) = 1 if x = 0 and h(x) = 0 if x = 0), hence the connection between the classes of prox-convex and semistrictly quasiconvex functions remains an open problem.

Proximal point type algorithms for nonconvex problems
In this section we show that the proximal point type algorithm remains convergent when the function to be minimized is proper, lower semicontinuous and prox-convex (on a given closed convex set), but not necessarily convex.Although the algorithm considered below is the simplest and most basic version available and some of the advances achieved in the convex case, such as accelerations and additional flexibility by employing additional parameters, are at the moment still open in the prox-convex setting, our investigations show that the proximal point type methods can be successfully extended towards other classes of nonconvex optimization problems for which they could not be employed so far due to lack of a theoretical fundament.
Theorem 4.1.Let K be a closed and convex set in R n and h : R n → R be a proper, lower semicontinuous and prox-convex on K function such that arg min K h = ∅ and K ∩ dom h = ∅.Then for any k ∈ N, we set Then {x k } k is a minimizing sequence of h over K, i.e., h(x k ) → min x∈K h(x) when k → +∞.
On the other hand, take x ∈ arg min K h.Then, for any k ∈ N, by taking x = x in equation (4.2), we have where we used that h(x) ≤ h(x k+1 ).Thus, {x k − x} k is bounded.Then, and passing to a subsequence if needed, x k → x when k → +∞.Finally, since h is lower semicontinuous and K is closed, we have lim inf k→+∞ h(x k ) = min x∈K h(x).
Remark 4.1.From (4.3) one can deduce straightforwardly that the known O(1/n) rate of convergence of the proximal point algorithm holds in the proxconvex case, too.
Remark 4.2.Although the function to be minimized in Theorem 4.1 by means of the proximal point algorithm is assumed to be prox-convex, its prox-convex value α > 0 needs not be known, even if it plays a role in the proof.Observe that h is continuous, strongly quasiconvex in the second argument, convex and strongly quasiconvex in the first argument, hence h is strongly quasiconvex without being convex on K. Furthermore, by Example 3.1 h is prox-convex on K.The global minimum of h over K is (2, 0) ⊤ and it can be found by applying Theorem 4.1, i.e. via the proximal point algorithm, although the function h is not convex.First one determines the proximity operator Taking into consideration the way K is defined, it follows that the proximal step in Theorem 4.1 delivers x k+1 = (2, x k 2 /3) ⊤ , where x k = (x k 1 , x k 2 ) ⊤ .Whatever feasible starting point x 1 ∈ K of the algorithm is chosen, it delivers the global minimum of h over K because x k 1 = 2 and x k 2 = x 1 2 /(3 k−1 ) for all k ∈ N.

Remark 4 . 3 .Example 4 . 1 .
One can modify the proximal point algorithm by replacing in (4.1) the proximal step by Prox h (S h(x k ) (h), x k ) without affecting the convergence of the generated sequence.Note also that taking K = R n in Theorem 4.1 one obtains the classical proximal point algorithm adapted for prox-convex functions and not for a restriction of such a function to a given closed convex set K ⊆ R n .Let K = [0, 2] × R and consider the function h : K → R given by h(x 1 , x 2 ) = x 2 2 − x 2 1 − x 1 .