New Tour on the Subdifferential of Supremum via Finite Sums and Suprema

This paper provides new characterizations for the subdifferential of the pointwise supremum of an arbitrary family of convex functions. The main feature of our approach is that the normal cone to the effective domain of the supremum (or to finite-dimensional sections of it) does not appear in our formulas. Another aspect of our analysis is that it emphasizes the relationship with the subdifferential of the supremum of finite subfamilies, or equivalently, finite weighted sums. Some specific results are given in the setting of reflexive Banach spaces, showing that the subdifferential of the supremum can be reduced to the supremum of a countable family.


Introduction
The characterization of the subdifferential of the pointwise supremum of a family of functions has attracted the attention of many researchers. Their interest comes from the fact that a huge number of important functions in convex analysis and optimization (like the Fenchel conjugate, the sum, the composition with affine mappings, etc.) can be expressed as suprema of this type. Accordingly, many publications in the last decades dealt with supremum functions and their subdifferentials and, among the most remarkable, we quote here the following ones: Brøndsted [1], Ioffe [9], Ioffe and Levin [10], Ioffe and Tikhomirov [11], Levin [12], Pschenichnyi [16], Rockafellar [17], Valadier [19], etc. See [18] to trace out the historical evolution of the topic.
More precisely, given the pointwise supremum f := sup t∈T f t of a family of convex functions f t : X → R ∪ {+∞}, t ∈ T , T being a non-empty and arbitrary set, defined on a separated locally convex space X , many authors addressed the problem of characterizing the subdifferential of the supremum, ∂ f (x), at any point x ∈ dom f , the effective domain of f . These characterizations are usually given in terms of (approximate-) subdifferentials of the data functions, ∂ ε f t (x), t ∈ T , ε ≥ 0, and, in the most general cases, in terms also of the normal cone to (finite-dimensional sections of) the effective domain of f , N L∩dom f (x). For instance, if f t ∈ 0 (X ), t ∈ T , where 0 (X ) is the family of proper convex and lower semicontinuous (lsc, in brief) functions, then the following key formula is proved in [7,Theorem 4] (see [14,Theorem 4] and [13] for related formulas): where co stands for the w * -closed convex hull, and F(x) := {L ⊂ X : L is a finite-dimensional linear subspace such that x ∈ L}.
In the so-called compact setting, which stands for assuming that T is compact and the mappings t → f t (z), z ∈ X , are upper semicontinuous (usc, in brief), the following result, involving only the active functions at the reference point x, is established in [4,Theorem 3.8]: where T (x) := T 0 (x) (see (2)). In order to get simpler formulas, without these normal cones, one possibility is to impose additional assumptions as the continuity of f at x, in which case (1) gives rise to ([7,Corollary 10]; see, also, [20], for normed spaces): The operation of taking the pointwise supremum is exclusive to convex analysis and has no equivalence in differential calculus. Since the sum operation is fundamental in classical calculus, many authors have been naturally led to establish a relationship between these two operations. In other words, they aimed to transform the supremum into a sum, in order to use the classical tools dealing with differentiable functions like Fermat's rule and many others.
In the case of finitely many functions f 1 , · · · , f n , with f = max 1≤k≤n f k , it is well-known that for every x ∈ X and ε ≥ 0 ([21, Corollary 2.8.11], see also Lemma 11 in Appendix for an alternative proof based on the minimax theorem) with and n being the canonical simplex in R n . The purpose of this paper is to establish new characterizations of ∂ f (x), in which only the data functions f t 's appear and without involving the extra term N dom f (x); namely, we provide the following more general formulas where T ε (x) := {J ⊂ T : J finite and max or equivalently, using (4), where Preliminary results in this direction have been obtained in [5] for the compact setting.
Both formulas (5) and (6) highlight the role played by the almost active functions at the reference point, whereas the normal cone which is present in (1) is now replaced by weighted finite maxima and sums. Formula (6) naturally covers some other formulas from the literature, as those established in [8,Theorem 2] for the case of exact subdifferentials (see, also, [15,Theorem 1]). At the same time, we prove that the choice of the involved convex combinations in (5) and (6) can be made more precise in the so-called compact setting; in fact, we establish that for any fixed t 0 ∈ T (x) we have that with The paper is structured as follows. After the section devoted to present the notation and preliminary results used in the paper, Sect. 3 provides, in Theorem 2, a representation of ∂ f (x), by means of specific convex combinations of the f t 's which involve at most two functions. Proposition 3, first result in Sect. 4, dealing with the non-compact setting, provides the reduction of the index set T to countable subsets. In this section, Theorems 4 and 5 give non-compact counterparts of the characterizations of N dom f (x) and ∂ f (x) established in [5]. Some technical results and/or proofs are transferred to appendix, with the purpose of simplifying the presentation of the more relevant results in the paper.
Given two sets A and B in X (or in X * ), and ⊂ R, we define By co(A) and cone(A), we denote the convex and the conical convex hulls of the non-empty set A, respectively. In the topological side, cl(A) and A are indistinctly used for denoting the closure of A. When A ⊂ X * , the closure is taken with respect to the w * -topology, unless something else is explicitly stated.
Associated with a non-empty set A ⊂ X , we define the negative dual cone and the orthogonal subspace of A as follows respectively. Observe that A − = (cone(A)) − . These concepts are defined similarly for sets in X * . The so-called bipolar theorem establishes that If A ⊂ X , we define the normal cone to A at x by If A = ∅ is convex and closed, A ∞ represents its recession cone defined by Given a function f : X −→ R∪{±∞}, its (effective) domain is dom f := {x ∈ X : f (x) < +∞}, and f is proper when dom f = ∅ and f (x) > −∞ for all x ∈ X . The closed convex hull of f , denoted by co f , is the largest lsc convex function dominated by f . If f is convex, then co f = cl f , the closed hull of f . For x ∈ X and ε ≥ 0, the ε-subdifferential (or the approximate subdifferential) of f at x is If f ∈ 0 (X ), x ∈ dom f , and ε > 0, then ∂ ε f (x) = ∅ and we have and Formula (12) is also valid for ε = 0 provided that ∂ f (x) = ∅.
The Fenchel conjugate of f is the function f * : X * −→ R ∪ {±∞} given by and it is well-known that, for all x ∈ f −1 (R) and ε ≥ 0, The support and the indicator functions of A ⊂ X are, respectively, It is known that, if A is a closed convex set, or equivalently, by using (9), Next, given a finite family { f k , 1 ≤ k ≤ n} ⊂ 0 (X ), we consider the maximum function f = max 1≤k≤n f k . We suppose that f is proper and denote The adopted convention 0(+∞) = +∞ entails 0 is concave and usc for every x ∈ dom f , and ϕ(λ, ·) is convex and lsc for every λ ∈ n . Thus, since n is compact in R n and dom f is non-empty and convex, the minimax theorem ensures that (see, e.g., [21, Theorem 2.10 Moreover, since As a consequence of this, for every x ∈ dom f and ε ≥ 0 we obtain that (see Lemma 11 in Appendix) where  (19) and (20) are specific to finite families of functions, and so they cannot be extended to families with infinitely many functions, where the following simplices in R T , may be not compact.

The Compact Setting
We give in this section some additional results to those established in [5]. We consider a non-empty family { f t , t ∈ T } ⊂ 0 (X ) such that T is Hausdorff compact, and, for each z ∈ X , the mapping t −→ f t (z) is upper semicontinuous.
The associated supremum function is and assumptions (22) ensure that (see [5,Lemma 5]) and, for every x ∈ dom f , Moreover, (22) also yields Assuming inf t∈T f t (x) > −∞, we have proved in [5,Theorem 12] that This formula involves the active functions f t , t ∈ T (x), as the same time as the nonactive ones f t , t ∈ T \ T (x), but with these last ones being affected by the weighting parameter ε > 0. The main ingredient we used to establish (25) is the following relation ([5, Theorem 6]) We give next an equivalent description of the elements in N dom f (x), which highlights the role played by the active and non-active functions.
Proof We fix ε > 0 and 0 < μ t < 1, for all t ∈ T \ T (x), and denote The sets T (x) and E ε are non-empty thanks to (22) and the lower semicontinuity of the f t 's. Since by (15), and (9), desired relation (26) is equivalent to To prove this inclusion we take, using (13), Hence, since for every t ∈ T \ T (x), by (11), and, so (24) gives rise to ) and the desired inclusion follows.
The main purpose of this section is to obtain another representation of ∂ f (x), which involves appropriate convex combinations of the non-active f t 's. In the non-compact setting, instead of considering two-elements convex combinations as in the compact framework, we shall appeal to all finite-elements convex combinations of the f t 's (see Theorem 5 below). where Proof Let us suppose, for simplicity, that f (x) = 0, and observe that for each given ε > 0 we have, for every t ∈ T \ T (x), Let us also denotef Thus, since we also have and the inclusion "⊃" follows by taking the closed convex hull and intersecting over ε > 0. To establish the inclusion "⊂", we fix ε > 0 and L ∈ F(x). Next, by applying Lemma 1 to the family {f t,ε , t ∈ T ; I L } we obtain that where the last equality comes from (47). Therefore, by (3), Intersecting over the L's in F(x) we get where the last equality is due to the fact that, for every A ⊂ X * (see ([3, Lemma 3])), In the particular case when all the f t 's are active at x, that is, which extends the well-known Brøndsted formula [1] to infinite index sets. Another illustration of Theorem 2 is the alternative proof of formula (51) in Appendix.

Non-Compact Framework
This section is devoted to give new characterizations of N dom f (x) and ∂ f (x), without any additional assumptions on the family { f t , t ∈ T } ⊂ 0 (X ).
The first result, whose proof is postponed to Appendix, provides the reduction of the index set T to countable subsets within the normal cone of dom f .

Proposition 3 Consider a family
The following result provides the non-compact counterpart of the characterizations of N dom f (x) established in [5].

Theorem 4
Consider the family { f t , t ∈ T } ⊂ 0 (X ) and f := sup t∈T f t . Given x ∈ dom f , for every ε > 0 we have that where T := {J ⊂ T , |J | < +∞} and In addition, (32) becomes an equality when Remark 1 (Before the proof ) Condition (34) is not very restrictive, indeed, it suffices to choose t 0 ∈ T and consider the family {max{ f t , f t 0 }, t ∈ T }. This new family obviously satisfies condition (34), where T := {J ∈ T : t 0 ∈ J }, and, consequently, Theorem 4 yields Proof Take u * ∈ N dom f (x) and ε > 0. Then, by Proposition 3, for every fixed We denote J n := {t 1 , · · · , t n }, n ≥ 1, and introduce the functionŝ where f J n = max{ f t , t ∈ J n } (see (33)). So, (f n ) n is non-decreasing and sup n≥1 f t n + I L = sup n≥1f n and dom sup n≥1 f t n ∩ L = dom sup n≥1f n .
In addition, according to Lemma 12 and (46), we have that Therefore, using (12), where the last equality is a consequence of (31).
For the converse inclusion, observe that (34) implies the existence of a constant M such that Then, for every J ∈ T and x * ∈ ∂ ε f J (x), where the last equality comes from (12).
Next, we give the main result in this section, which constitutes a non-compact counterpart of Theorem 2.

Theorem 5
Consider the family { f t , t ∈ T } ⊂ 0 (X ) and f := sup t∈T f t . Then for every x ∈ dom f we have that Proof Fix x ∈ dom f and ε > 0 so that, by formula (1), and whichever L ∈ F(x) we take, one has Now we pick t 0 ∈ T ε (x), and denote T := {J ∈ T : t 0 ∈ J }, and Then, by Remark 1, and taking into account (47 ), (48), and (39), where we have denoted So, (37) gives rise to that is, the desired inclusion "⊂" follows once we intersect over ε > 0.
To verify the opposite inclusion, by (36) we easily observe that and so, For x ∈ dom f , δ ≥ 0 and J ∈ T , we denote Observe that Theorem 5 leads us to the characterization below, involving the finite suprema f J or sums t∈J λ t f t .

Corollary 6
Consider the family { f t , t ∈ T } ⊂ 0 (X ) and f := sup t∈T f t . Then for every x ∈ dom f we have that and, consequently, and so, by Theorem 5 and Lemma 13 (for the second inclusion), Hence, the inclusion "⊂" in (41) follows.
To verify the opposite inclusion, take x * ∈ ∂ ε f J (x), J ∈ T ε (x), and ε > 0. Then, for every y ∈ X , and we are done with the first statement.

Remark 2
Let us emphasize at this point that the main feature of our approach is to provide characterizations of ∂ f (x), which are independent of the effective domains of the involved functions and the associated normal cones. For comparative purposes, we quote here the following formula, given in [15,Theorem 1], with D being any subset of X satisfying Observe that formula (43) requires the use of the augmented functions f t + I D and not the exact ones f t 's as in (42). The following example illustrates the difference between (42) and (43).

Example 1
Consider the support function of a non-empty set T ⊂ X * , x .
On the one hand, for every x ∈ X , formula (42) yields which is well-known (see, for instance, [6, (5) in page 834]); actually, it is a consequence of the Fenchel equality as On the other hand, if we apply formula (43) choosing D = dom σ T , then we obtain that Hence, using Lemma 14, we derive the following alternative representation of ∂ f (x), which appeals to the extra term {y * ∈ (coT ) ∞ : −ε ≤ y * , x ≤ 0}.
We apply Corollary 6 to provide a new proof for the characterization of the normal cone to sublevel sets given in [8,Corollary 7] (see, also, [2] and references therein).

Corollary 7
Consider a function g ∈ 0 (X ) and let x ∈ X such that g(x) = 0. Then we have that Proof We define the functions f t := tg, t > 0, and f := sup Obviously, { f t , t ∈ T } ⊂ 0 (X ) and f t (x) = f (x) = 0 for all t > 0. Therefore, since that f = I [g≤0] , by formula (42) we obtain that Hence, and we are done.
The following corollary gives more insight to the conclusion of Corollary 6 in reflexive Banach spaces.
where B X * is the closed unit ball in X * . Moreover, denoting J := ∪ n≥1 J n , for every z * ∈ ∂ 1 n f J n (x) we have that showing that z * ∈ ∂ 2 Hence, (45) gives rise to and v * n ∈ 1 n B X * , n ≥ 1. Hence, v * n → θ and we obtain, for every y ∈ X , which shows that x * ∈ ∂ f J (x). Moreover, since that We are done since the opposite inclusion holds straightforwardly.

Concluding Remarks
This paper is intended to establish new characterizations of the subdifferential of the pointwise supremum of an arbitrary family of convex functions which are free of the normal cone to the effective domain of the supremum (or to finite-dimensional sections of it). These characterizations involve both (almost) active and non-(almost) active functions, the last ones being affected by a weighting parameter. Main formulas (5) and (6) highlight the role played by the almost active functions at the reference point. Formula (6) covers some other formulas in the literature; e.g., [8,Theorem 2] in the case of exact subdifferentials (see, also, [15,Theorem 1]).
The first part of the paper deals with the so-called compact scenario in which we assumed that the index set is compact and that the functions are upper semicontinuous with respect to the index. In this part, we first provide an explicit representation of the subdifferential of the supremum in Theorem 2, in terms of the active functions in one side, plus specific two-elements convex combinations in the other side.
In the second part of the paper, these compactness/upper semicontinuity assumptions are removed, and main Theorem 5 constitutes a non-compact counterpart of Theorem 2.We also aimed in the paper to emphasize the relationship of the subdifferential of the supremum function with the subdifferential of finite weighted sums. This is the purpose of (42) in Corollary 6.
Some consequences of the main results in the setting of reflexive Banach spaces are also analyzed. In particular, it turns out that formulas (41) and (42) are valid when the closure is taken with respect to the strong (norm) topology. The last proposition in the paper shows that, in this setting, the subdifferential of the supremum of the whole family can be reduced to the supremum of a countable subfamilies.
Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Lemma 11
Consider a finite family { f k , 1 ≤ k ≤ n} ⊂ 0 (X ), and f = max 1≤k≤n f k . If the function ϕ is defined as in (16), i.e., then for every x ∈ dom f and ε ≥ 0 we have that In particular, we have that Proof Givenx ∈ X and ε ≥ 0, we first assume that θ ∈ ∂ ε f (x). So,x is an εminimizer of the function f , and (18) yields someλ ∈ n such that ϕ(λ,x)).
The following result is also used in the current work, but it has its own interest.
Lemma 12 Let ( f n ) n ⊂ 0 (X ) be a non-decreasing sequence, and denote f := sup n≥1 f n . Then, for all x ∈ dom f and ε ≥ 0, Proof Take x * ∈ ∂ ε f (x) and fix δ > 0. Then We know that and so, since ( f * n ) n is non-increasing, the function inf n≥1 f * n is convex and f * is the closed hull of inf n≥1 f * n , i.e., Then and, so, there exists a net (x * i ) i ⊂ X * w * -converging to x * such that In other words, for each i, there exists k ≥ 1 such that for all n ≥ k that is, and, by taking the limit on i, The direct inclusion follows by intersecting over δ > 0. The opposite inclusion is straightforward.
The following technical lemma is needed in the proof of Corollary 6.

Proof of Proposition 3
Fix positive integers m, n with m > f (x), and take δ > 0. Since we have that, for every y ∈ L, f (y) ≤ m ⇒ nu * , y − x ≤ 0 < δ, that is, and so nu * , y − x ≥ δ, y ∈ L ⇒ ∃t ∈ T such that f t (y) > m.
In other words, and this shows that where B δ (x) denotes the ball in L centered at x with radius δ (L endowed with the relative topology of X is isomorphic to an Euclidean space and, consequently, B δ (x) is compact). Therefore, since the sets [ f t > m] are open, by the lower semicontinuity of the f t 's, due to (55) we find a finite set t  i is the supremum of a countable family.
In the following lemma, the result corresponding to ε = 0 can be found in [6, (8) in page 835].