On semi-continuity and continuity of the smallest and largest maximizing point of real convex functions with applications in probability and statistics

We prove that the smallest maximizer s(f) of a real convex function f is less than or equal to a real point x if and only if the right derivative of f at x is non-negative. Similarly, the largest maximizer t(f) is greater or equal to x if and only if the left derivative of f at x is non-positive. From this simple result we deduce measurability and semi-continuity of the functionals s and t. Furthermore, if f has a unique minimizing point, so that s(f) = t(f), then the functional is continuous at f. With these analytical preparations we can apply Continuous Mapping Theorems to obtain several Argmin theorems for convex stochastic processes. The novelty here are statements about classical distributional convergence and almost sure convergence, if the limit process does not have a unique minimum point. This is possible by replacing the natural topology on R with the order topologies. Another new feature is that not only sequences but more generally nets of convex stochastic processes are allowed.


Introduction
Let f : R → R be a convex function with pertaining set of all minimizing points.This minimum set can equivalently be described in terms of the left and right derivative D + f and D − f of f .Indeed, it follows from Theorem 23.2 of Rockafellar [21] that Of course, it can happen that A(f ) is empty.However, if this is not the case, it is well-known (and actually easy to see) that then A(f ) is closed and convex and hence is a closed interval.The extreme case A(f ) = R occurs if and only if f is a constant function.So, as long as f is not a constant function there are three possibilities: (i) Let C := {f : R → R; f convex} be the class of all convex functions.Introduce S := {f ∈ C : f with (i) or (ii)} and S ′ := {f ∈ C : f with (i) or (iii)}.
Thus S and S ′ consist exactly of those functions for which the smallest and largest minimizer, respectively, exist.Consequently the functionals σ : S → R and τ : S ′ → R given by σ(f ) = min A(f ) and τ (f ) = max A(f ) are well defined on their domains.
Our first main result provides a necessary and sufficient condition for the location of the smallest and the largest maximizer, respectively.It is rather simple with an elementary proof, but with it we will draw a whole series of useful conclusions.In section 2 it is shown that σ and τ are measurable and semi-continuous.On the set S u of all convex functions with exactly one minimizer the functionals σ and τ coincide and are furthermore continuous there.In section 3 this is used in combination with Continuous Mapping Theorems to derive several Argmin theorems for convex stochastic processes.For a further discussion of our findings we refer to the concluding remarks at the end of section 3.
Theorem 1.Let σ(f ) = min A(f ) and τ (f ) = max A(f ) be the smallest and largest maximizing point of the convex function f .Then the following equivalent relations hold for every x ∈ R: Proof.For the proof of (3) we briefly write σ for σ(f ).Recall that D + f and D − f are non-decreasing, confer, e.g., Theorem 1.3.3 in Niculescu and Persson [20].Thus σ ≤ x entails D + f (x) ≥ D + f (σ) ≥ 0, where the last inequality follows from (2), because σ ∈ A(f ).To see the reverse conclusion in (3) we use that by Theorem 1.3.1 in Niculescu and Persson [20] the difference quotients f (t)−f (x) t−x are non-increasing as t ↓ x.Therefore we obtain Multiplication with the positive difference t − x yields that f (t) ≥ f (x) for all t > x and so inf t>x where the second inequality is trivial.To sum up we arrive at Next we consider a point t < σ.Infer from the monotonicity of where the last inequality is ensured by (2).Just as for D + f , we see that Multiplication with the negative difference s − t gives f (s) ≥ f (t) for all s < t < σ.Since f is continuous, taking the limit t ↑ σ finally shows that f is non-increasing on the closed half-line (−∞, σ].In particular, f (t) ≥ f (σ) for all t ≤ σ.Actually, on the open interval (−∞, σ) the inequality is strict: This is because otherwise there exists some t 0 < σ such that f (t 0 ) ≤ f (σ).Since σ is a minimizing point, t 0 must be a minimizing point as well.This is a contradiction to minimality of σ.Now, assume that x < σ.Deduce from f is non-increasing on (−∞, x] ⊆ (−∞, σ] that f (t) ≥ f (x) for all t ≤ x.Consequently, inf t≤x f (t) ≥ f (x) and therefore by ( 6) and (7): This is a contradiction and thus x ≥ σ is true as desired.
For the proof of (4) we use a time-reversing argument.Introduce the function f − defined by f − (t) := f (−t), t ∈ R. One easily verifies that f − is convex and that τ (f ) = −σ(f − ).We obtain by (3): and the assertion follows upon noticing that D Remark 1.Under the existence of the minimizers one has that: Indeed, recall that D + f and D − f are non-decreasing, which by (2) shows necessity of (b).By another application of (2) the point x in (b) is a minimizer, x ∈ A(f ), and (a) follows using the minimum and maximum property of σ(f ) = min A(f ) and τ (f ) = max A(f ), respectively.Note that we cannot readily infer Proposition 1 from (8).For example, if we only know that D + f (x) ≥ 0 holds, then x need not be a minimizing point in general, and is not, as can be seen from simple examples as for instance f (x) = x 2 .Consequently, the argument via the minimum property of σ(f ) fails.
For every non-decreasing function F : R → R we introduce the generalized inverses: For properties of these inverses, confer Embrechts and Hofert [5], Feng et.al.[6] or Fortelle [10].Notice that (3) is the same as

Measurability, Semi-continuity and Continuity of the argmin-functionals
Recall that C is the class of all convex functions on R. For each t ∈ R let π t : C → R denote the projection (evaluation map) at t, that is π t (f ) = f (t).We endow the function space C with the σ−algebra generated by the projections: C := σ(π t : t ∈ R).Furthermore, C is equipped with the topology T of pointwise convergence, which is known to be generated by the projections: T := τ (π t : t ∈ R).Recall that T is the smallest topology on C for which all projections are continuous.Note that the trace C S := S ∩ C on S is generated by the restrictions of π t to S. Analogously, the subspace topology T S = S ∩ T on S is generated by these restrictions.The corresponding statements hold for the trace C S ′ = S ′ ∩ C and the subspace topology Proof.For each x ∈ R we have that: by the first equality in (5) By construction of C every projection is C-measurable, whence the differences π t − π x are C-measurable as well and therefore (π t − π x ) −1 ([0, ∞)) ∈ C for all rationals t > x.
Since C is closed under denumerable intersections we arrive at {f ∈ S : σ(f ) ≤ x} ∈ S ∩ C = C S for all x ∈ R, which by Lemma 1.4 in Kallenberg [17] shows measurability of σ.
As to semicontinuity recall that by construction of T every projection π t is Tcontinuous, whence the differences π t − π x are T -continuous as well and therefore (π t − π x ) −1 ([0, ∞)) are T −closed for all t > x.Since T is closed under every kind of intersections we arrive at {f ∈ S : σ(f ) ≤ x} is T S −closed for all x ∈ R, which shows T S −semicontinuity of σ.
The second part follows in the same way.Indeed, since Next we give further equivalent characterizations of semi-continuity.The first one is an immediate consequence of Proposition 2 and the definition of continuity (pre-images of open sets are open).
R} be the left-order topology and the right-order topology.Then: Sometimes it is advantageous to consider the restrictions of σ and τ on subspaces.
Corollary 1 in turn yields a second equivalent description of semi-continuity via net-convergence.For this purpose, let (I, ≤) be here and in the following a directed set.
Corollary 2. Assume that (f α ) α∈I ⊆ C converges pointwise to f on R. Then the following statements apply: to y if and only if lim inf α y α ≥ y, which gives (1).In the same way one obtains (2) upon noticing that y α → y in (R, O < ) if and only if lim sup α y α ≤ y.Finally, (3) follows from ( 1) and ( 2), because σ ≤ τ and therefore Semi-continuity of σ and τ as stated in Proposition 2 and its reformulations in Corollaries 1 and 2 turn out to be a very strong tool for proving so-called Argmin theorems in probability and statistics.
Occasionally it is stated in the literature that σ or τ are actually continuous with respect to the natural topology O n on R.But the following examples shows that this is not true.
Obviously, f and f n , n ∈ N are convex and f n converges at every point (actually uniformly on R) to f .However, τ (f n ) = 0 for all n ∈ N, whereas τ (f ) = 1 and consequently τ (f n ) → τ (f ).Thus τ is not continuous at f and from σ(f ) = −τ (f − ) we infer that σ is not continuous at f − .
Note that the limit function in our example has no unique minimizing point.So let us consider the family S u of all functions f with a unique minimizer: Clearly, the functionals σ and τ coincide on S u .Therefore, from Remark 2 we can infer that σ is lower-and upper-semicontinuous on the subspace (S u , T Su ), whence σ is continuous on (S u , T Su ) with respect to the natural topology O n on R. We note this in the following for all α ∈ I, it follows from the above Corollary 2 (and the characterization of net-convergence in the order topologies, confer the proof of Corollary 2) that and in (R, O < ), which, as we know, is the same as ξ(f α ) → σ(f ) = τ (f ) in the natural topology O n .In particular, every measurable selection of A is continuous on the subspace S u with limit σ = τ .We see here, and will see it another time later, that the class S u of convex functions with unique minimization point plays a special role.
In addition to the topology T of pointwise convergence, let C also be endowed with the topology T uc of uniform convergence on compacta.It is well-known that T ⊆ T uc , because uniform convergence on compacta implies pointwise convergence.From Theorem 10.8 of Rockafellar [21] we know that on C the reverse is true.Notice that this is valid only for sequences.Thus the identity i : (C, T ) → (C, T uc ) is sequentially continuous at every f ∈ C. Unfortunately, in general topological spaces sequential continuity does not imply continuity, which in turn would give T uc = i −1 (T uc ) ⊆ T as desired.In fact the implication is true, if the space is first countable, confer Theorem 7.1.3in Singh [22].At this stage, however, we do not know whether first countability holds for (C, T ).So, we will prove continuity of i : (C, T ) → (C, T uc ) traditionally via net-convergence, confer Theorem 4.2.6 in in Singh [22].Theorem 3 below on net-convergence is not only the key to success, but above all interesting itself when compared with the sequential convergence occurring in Theorem 10.8 of Rockafellar [21].The proof of Theorem 3 is based on the following inequality.
Lemma 2. Let D be a dense subset of R. Then for every compact set K ⊆ R there exist a constant C and points d 1 , . . ., d 8 ∈ D such that for each convex function f : R → R it follows: Proof.First, find points a and b from D such that K ⊆ [a, b].By Theorem 1.3.7 in Niculescu and Persson [20] we have that: where and since Next, in (10) and in (11) we can choose the points x and y from D.
Similarly one obtains: To state our next result recall that (I, ≤) is a directed set.
Theorem 3. Let D be dense in R and let (f α ) α∈I be a net in C, which converges pointwise on D to a function f , that is f α (t) → f (t) for all t ∈ D. Then f is convex and (f α ) converges uniformly to f on every compact subset of R.
Proof.Convexity of f follows from the convexity of f α by taking the limit.As to the second assertion let K ⊆ R be compact.By Lemma 2 there are a constant C and By pointwise convergence we find for every 1 whence F is pointwise bounded.Thus by the Arzelà-Ascoli theorem, confer, e.g., Heuser [16], the family F is compact.Therefore, if (f α ′ ) is a subnet of (f α ) α * ≤α∈I , then there exists a further subnet (f α ′′ ) of (f α ′ ), which converges to a function g uniformly on K.In particular, f α ′′ (t) → g(t) for all t ∈ K.But by the assumption of pointwise convergence we also know that f α ′′ (t) → f (t) for all t ∈ K. Thus g = f on K and by the subnet-criterion it follows that (f α ) α * ≤α∈I converges to f uniformly on K, which a fortiori holds for the entire net (f α ) α∈I .
Infer from the above Theorem 3 that if a net f α → f in (C, T ), then f α → f in (C, T uc ).Obviously, the reverse is true as well.As a consequence we obtain Let D = {t i : i ∈ N} be a countable and dense subset of R. Introduce the special projection map H : C → R N by H(f ) := (f (t i )) i∈N .Note that H depends on D, but we suppress this in our notation.Equip R N with the product topology Π. Denote the range H(C) by R and the relative topology R ∩ Π by R. With the following result we will prove a functional limit theorem for convex stochastic processes.Lemma 3. The map H is a bijection onto its range and its inverse H −1 : (R, R) → (C, T ) is continuous.
Proof.If H(f ) = H(g), then f = g on D and by continuity and denseness of D the equality holds on the entire real line.Thus H is injective, and it is surjective by construction.
As to continuity of the inverse consider a sequence (r n ) with Since (r n ) ⊆ R, we find to each n ∈ N a function Recall that convergence in Π or in R, respectively, is the same as coordinate-wise convergence.Thus the convergence in ( 13) means that f n (t i ) → f (t i ) for all i ∈ N.
Since D lies dense in R we can apply Theorem 10.8 in Rockafellar [21] (or our Theorem 3) to infer that actually f n (t) → f (t) for every t ∈ R. Now, by definition f n = H −1 (r n ) and f = H −1 (r), whence we arrive at

Applications in probability and statistics
Let (Ω, A) be a measurable space.For a map Z : Ω → C we write Z(ω, t) := Z(ω)(t) for the value of the function Z(ω) : R → R (trajectoty) at point t ∈ R. Very often it is more convenient to write Z(t) instead of Z(ω, t) for this ambiguity in the notation explains in the context.Let B := B(R) denote the Borel-σ algebra on R.
If Z(•, t) : Ω → R is A − B measurable for each t ∈ R, then Z is called a convex stochastic process.This is the same as saying that Z(t) is a real random variable for all t ∈ R. If Z(ω) ∈ U for all ω ∈ Ω, where U is a subset of C, we say that Z is a convex stochastic process in U or for short a process in U .In other words, all trajectories of Z are U −valued.
Let B(C) := σ(T ) be the Borel-σ algebra pertaining to the topology of pointwise convergence.By Corollary 4 it coincides with B uc (C) := σ(T uc ).The following result yields a convenient characterization of the Borel-σ algebra.Recall that by definition C = σ(π t : t ∈ R).Proof.Let C * := {f : R → R; f continuous} be endowed with the topology T * uc of uniform convergence on compacta.One verifies easily that where the second equality is ensured by Lemma 1.2.5 in Gänssler and Stute [15].By Lemma A5.1 in Kallenberg [18] we have that σ(T Proof.The first assertion follows from Proposition 4, which enables us to apply Proposition 1.2.11 in Gänssler and Stute [15].The second assertion follows from The concept of convergence in distribution is well-known for random variables with values in a metric space.A classical reference here is the book of Billingsley [1].In contrast, the extension of the concept from metric spaces to topological spaces seems less known.It goes back to Gänssler and Stute [15], who in turn modify the ideas of Topsøe [23].Let Z and Z α , α ∈ I, be random variables defined on a probability space (Ω, A, P) with values in some topological space (X, O), that is Z : Ω → X and Z α : Ω → X are A − B(X) measurable, where B(X) := σ(O) denotes the Borel-σ algebra.Then the net ( This is denoted by where F is the family of all closed sets in (X, O).
The following result plays an important role in what follows.For this reason we like to state it here.The proof is comparatively simple and can be found in Gänssler and Stute [15], p.345.Theorem 5. (Continuous Mapping) Let (X, O) and (E, G) be topological spaces, h : X → E be B(X) − B(E) measurable and D h := {x ∈ X : h is discontinuous at x}. Suppose Z and Z α , α ∈ I, are random variables over (Ω, A, P) with values in (X, O), where P * (Z ∈ D h ) = 0 with P * the outer measure of P.
measurable.Suppose Z and Z α for all α ∈ I are random variables with values in the subspace.Then given by i(x) = x is continuous.Thus the Contiuous Mapping Theorem (CMT) shows sufficiency.The necessity follows from (14) We call this equivalence the Subspace-lemma.
For further properties including the Portmanteau-Theorem we refer to chapter 8.4 in Gänssler and Stute [15].
Recall the left-and right-order topologies O < and O > , which are not metrizible.If a net (x α ) α∈I converges in (R, O < ) and in (R, O > ), then it converges in the natural topology O n , and vice versa.The following example shows that there is a counterpart for distributional convergence.
Here, ( 16) and ( 17) are immediate consequences of the definitions.In (18) the sufficiency of the right side follows from O n ⊇ O < and O n ⊇ O > .To see necessity let x ∈ R. Then we obtain: ≤ lim sup α P(Z α < x) ≤ lim sup α P(Z α ≤ x) ≤ P(Z ≤ x) by ( 16) and complementation.
Thus, if the distribution function of Z is continuous at x, i.e., P(Z < x) = P(Z ≤ x) it follows that lim α P(Z α ≤ x) = P(Z ≤ x) as required.
Deduce from ( 16): If Z α ≤ Z * α P−almost surely (a.s.) for all α ≥ α 0 ∈ I or if Z ≥ Z * a.s., then In particular, this shows that the limit variables are not unique.
Our next result characterizes distributional convergence for random variables with values in the function space (C, T ).Below the euclidian space R k is endowed with the product-topology O k n .
Proposition 6. (Functional limits) Let Z and Z α , α ∈ I, be convex stochastic processes.Then they are random variables in (C, T ) and the following statements (19) and ( 20) are equivalent: ) for every k ∈ N and for each collection of points t 1 , . . ., t k ∈ D, where D is a countable and dense subset of R.
For the converse first note that by countability we have that D = {t 1 , t 2 , . ..}. Recall the projection map H given in Lemma 3. It follows from Example 2.6 in Billingsley [2] that (20) entails that H(Z α ) D → H(Z) in (R N , Π).Now, the Subspace-lemma says that H(Z α ) D → H(Z) in (R, R).By Lemma 3 the inverse H −1 : (R, R) → (C, T ) is continuous, so that another application of the CMT yields (19).
The second statement (20) is known as convergence of the finite dimensional distributions (on D) and denoted by Z α f d −→ D Z (in short: convergence of the fidis).
We are now in the position to formulate several so-called Argmin-Theorems for convex stochastic processes.Theorem 7. Let D be countable and dense in R. Consider convex stochastic processes Z and Z α , α ∈ I, in U .Suppose that Z α f d −→ D Z. Then the following statements hold: If in addition Z is a process in S * with Z ∈ S u a.s., then σ(Z α ) Consequently by (21) another application of the CMT gives the second convergence in (1).
A comparison shows that on the one hand in (3) the uniqueness condition is weaker than in (1) or in (2), on the other hand in (3) the demands on the stochastic processes are more stringent.
In the following we give some interesting equivalent characterizations for almost sure uniqueness of the minimizing point.Proposition 8. Suppose Z is a convex stochastic process in S * .Then the following three statements are equivalent: the publications mentioned above only consider sequences, whereas we more generally allow for nets of processes .Secondly, a further main difference is that we also provide results when the limit process does not have a unique minimizing point.Incidentally, both points also apply to the Argmin theorems for almost sure convergence.The idea behind this is to understand semi-continuity of σ or τ as continuity with respect to the order topologies in place of the natural topology on R.After that the Continuous Mapping Theorem does the rest.Another way of looking at our approach is this: As long as a unique minimizer of the limit process exists, it is a natural candidate for the limit variable.If this is not the case, we do not search for new candidates, but simply make the topology on R smaller.This also differs from Ferger's [7] innovative approach, which retains the natural topology but more generally allows Choquet-capacities to play the part of limit "distributions".The applicability of the Argmin theorem for convex processes lies in the fact that here, in contrast to such processes in larger function spaces, the only prerequisite is that of convergence of the finite dimensional distributions, confer Ferger [9] for an application in M −estimation.In larger spacec as C * , for example, and even if you only consider sequences, one needs not only a functional limit theorem Z n D → Z in (C * , T * uc ), but also stochastic boundedness of the ξ(Z n ): lim see Ibragimov and Hasminski (1981) [13], van der Vaart and Wellner (1996) [24] or Ferger (2015) [8].For the proof of the functional limit theorem alone, besides convergence of the fidis also tightness of the sequence (Z n ) is required, which is usually done through maximal inequalities.Not to forget the proof of ( 22), usually by upper estimates for the tail probabilities.This means that the programme that has to be worked through is much more extensive and demanding than in the convex case.There are two answers to the question why this is so.Firstly, by Proposition 6 it is already true under the sole assumption that the fidis converge, that then even a functional limit theorem applies.Secondly, there is no counterpart of Corollary 3, which says that σ is continuous on S u .For example, let us consider (C * , T * uc ) and let σ * (f ) be the smallest maximizing point of f (existence assumed).Then one can construct a sequence (f n ) such that f n converges to f uniformly on every compact K ⊆ R, i.e., f n → f in (C * , T * uc ), but σ * (f n ) → −∞.In particular, σ * is far from being continuous on S u .

Corollary 4 .
The topology of pointwise convergence and the topology of uniform convergence on compacta coincide: T = T uc Remark 3. Let T (D) be the topology of pointwise convergence on D. It is generated by the projections π t , t ∈ D. If D is dense in R, then Theorem 3 actually yields that T (D) = T uc = T .So all the topologies match.The observation in Remark 3 leads to the following variant of the semi-continuity.Corollary 5. Let D be dense in R. If in Corollary 2 the assumption is replaced by (f α ) α∈I ⊆ C converges pointwise to f on D, then all statements of Corollary 2 remain valid.
in combination with the first assertion.Recall that S * = S ∩ S ′ and ξ : (S * , C S * ) → (R, B) denotes any measurable selection of A. Corollary 6.If Z is a convex stochastic process in S, in S ′ or in S * , then σ(Z), τ (Z) or ξ(Z), respectively, are real random variables.Proof.Lemma 4 says that Z : (Ω, A) → (S, B(S)) is measurable.But B(S) = S ∩ B(C) = S ∩ C = C S , where the second equality holds by Proposition 4. From Proposition 2 we know that σ : (S, C S ) → (R, B) is measurable, whence σ(Z) = σ • Z is measurable as composition of measurable maps.Replacing S through S ′ or S * gives measurability of τ (Z) or ξ(Z), respectively.

Proof.
First notice that by Corollary 6 all involved maps are real random variables.By Proposition 6 Z α D −→ Z in (C, T ), whence by the Subspace-lemma Z α D −→ Z in (S, T S ).(21) From Corollary 1 we know that σ : (S, T S ) → (R, O > ) is continuous and consequently the CMT yields the first convergence in (1).By Lemma 1 S u ∈ C S * .According to Lemma 4 Z is A − C S * measurable upon noticing that B(S * ) = S * ∩B(C) = S * ∩C = C S * by Proposition 4. Thus {Z ∈ S u } ∈ A, whence by Corollary 3 it follows that 0 ≤ P * (Z ∈ D σ ) ≤ P * (Z / ∈ S u ) = P(Z / ∈ S u ) = 0. Finally, by Propositions 2 and 4 σ : (S, T