1 Introduction and a basic result

We start with some foundational definitions and a few preliminary remarks. Let \(f:\mathbb {R} \rightarrow \mathbb {R}\) be a convex function with pertaining set

$$\begin{aligned} A(f):=\left\{ \hspace{0.55542pt}t \in \mathbb {R}\,{:}\hspace{0.55542pt}f(t)=\inf _{s \in \mathbb {R}}f(s)\right\} \end{aligned}$$
(1.1)

of all minimizing points. This minimum set can equivalently be described in terms of the right and left derivative \(D^+f\) and \(D^-f\) of f. Indeed, it follows from Theorem 23.2 of Rockafellar [21] that

$$\begin{aligned} A(f)= \{\hspace{0.55542pt}t \in \mathbb {R}: D^-f(t) \leqslant 0 \leqslant D^+f(t)\}. \end{aligned}$$
(1.2)

Alternatively, see Corollary 7.2 in Ferger [10], where we give an elementary proof. Of course, it can happen that A(f) is empty. However, if this is not the case, it is well known (and actually easy to see) that then A(f) is closed and convex and hence is a closed interval. The extreme case \(A(f)=\mathbb {R}\) occurs if and only if f is a constant function. So, as long as f is not a constant function there are three possibilities: (i) \(A(f)=[a,b]\), \(a \leqslant b \in \mathbb {R}\), (ii) \(A(f)= [a,\infty )\), \(a \in \mathbb {R}\), or (iii) \(A(f)=(-\infty ,b]\), \(b \in \mathbb {R}\).

Let \(C:=\{f:\mathbb {R} \rightarrow \mathbb {R}; f \text { convex}\}\) be the class of all convex functions. Introduce

$$\begin{aligned} S&:=\{f \in C\,{:}\, \text {with (i) or (ii)}\},\\ S'&:=\{f \in C\,{:}\, \text {with (i) or (iii)}\}. \end{aligned}$$

Thus S and \(S'\) consist exactly of those functions for which the smallest and largest minimizer, respectively, exist. Consequently the functionals \(\sigma :S \rightarrow \mathbb {R}\) and \(\tau :S'\! \rightarrow \mathbb {R}\) given by \(\sigma (f)=\min A(f)\) and \(\tau (f)=\max A(f)\) are well defined on their domains.

Our first main result provides a necessary and sufficient condition for the location of the smallest and the largest minimizer, respectively. It is rather simple with an elementary proof, but with it we will draw a whole series of useful conclusions. In Sect. 2 it is shown that \(\sigma \) and \(\tau \) are measurable and semi-continuous. On the set \(S_{\textrm{u}}\) of all convex functions with exactly one minimizer the functionals \(\sigma \) and \(\tau \) coincide and are furthermore continuous there. In Sect. 3 this is used in combination with Continuous Mapping Theorems to derive several Argmin theorems for convex stochastic processes. For a further discussion of our findings we refer to the concluding remarks at the end of Sect. 3.

Theorem 1.1

Let \(\sigma (f)=\min A(f)\) and \(\tau (f)=\max A(f)\) be the smallest and largest minimizing point of the convex function f. Then the following equivalent relations hold for every \(x \in \mathbb {R}\):

$$\begin{aligned}&\sigma (f) \leqslant x \quad \Longleftrightarrow \quad D^+f(x) \geqslant 0, \end{aligned}$$
(1.3)
$$\begin{aligned}&\tau (f) \geqslant x \quad \Longleftrightarrow \quad D^-f(x) \leqslant 0. \end{aligned}$$
(1.4)

Proof

For the proof of (1.3) we briefly write \(\sigma \) for \(\sigma (f)\). Recall that \(D^+f\) and \(D^-f\) are non-decreasing, confer, e.g., Theorem 1.3.3 in Niculescu and Persson [20]. Thus \(\sigma \leqslant x\) entails \(D^+f(x) \geqslant D^+f(\sigma ) \geqslant 0\), where the last inequality follows from (1.2), because \(\sigma \in A(f)\). To see the reverse conclusion in (1.3) we use that by [20, Theorem 1.3.1] the difference quotients \(({f(t)-f(x)})/({t-x})\) are non-increasing as \(t\downarrow x\). Therefore we obtain

$$\begin{aligned} 0 \leqslant D^+f(x)=\inf _{t>x}\frac{f(t)-f(x)}{t-x} \leqslant \frac{f(t)-f(x)}{t-x} \quad \text {for all}\; \; t>x. \end{aligned}$$
(1.5)

Multiplication with the positive difference \(t-x\) yields that \(f(t) \geqslant f(x)\) for all \(t>x\) and so

$$\begin{aligned} \inf _{t>x}f(t) \geqslant f(x) \geqslant \inf _{t \leqslant x}f(t), \end{aligned}$$

where the second inequality is trivial. To sum up we arrive at

$$\begin{aligned} \inf _{t \in \mathbb {R}} f(t)=\inf _{t \leqslant x} f(t). \end{aligned}$$
(1.6)

Next we consider a point \(t < \sigma \). Infer from the monotonicity of \(D^-f\) that \(D^-f(t) \leqslant D^-f(\sigma ) \leqslant 0\), where the last inequality is ensured by (1.2). Just as for \(D^+f\), we see that \(D^-f(t)=\sup _{s<t}({f(s)-f(t)})/({s-t})\), whence in view of \(D^-f(t) \leqslant 0\) we have that

$$\begin{aligned} \frac{f(s)-f(t)}{s-t} \leqslant 0 \quad \text {for all}\;\; s<t. \end{aligned}$$

Multiplication with the negative difference \(s-t\) gives \(f(s) \geqslant f(t)\) for all \(s<t < \sigma \). Since f is continuous, taking the limit \(t \uparrow \sigma \) finally shows that f is non-increasing on the closed half-line \((-\infty ,\sigma ]\). In particular, \(f(t) \geqslant f(\sigma )\) for all \(t \leqslant \sigma \). Actually, on the open interval \((-\infty ,\sigma )\) the inequality is strict:

$$\begin{aligned} f(t) > f(\sigma ) \quad \text {for all}\; \; t < \sigma . \end{aligned}$$
(1.7)

This is because otherwise there exists some \(t_0 < \sigma \) such that \(f(t_0) \leqslant f(\sigma )\). Since \(\sigma \) is a minimizing point, \(t_0\) must be a minimizing point as well. This is a contradiction to minimality of \(\sigma \).

Now, assume that \(x< \sigma \). Deduce from f is non-increasing on \((-\infty ,x] \subseteq (-\infty ,\sigma ]\) that \(f(t) \geqslant f(x)\) for all \(t \leqslant x\). Consequently, \(\inf _{t \leqslant x}f(t) \geqslant f(x)\) and therefore by (1.6) and (1.7):

$$\begin{aligned} f(\sigma )=\inf _{t \in \mathbb {R}} f(t) = \inf _{t \leqslant x} f(t) \geqslant f(x) > f(\sigma ). \end{aligned}$$

This is a contradiction and thus \(x \geqslant \sigma \) is true as desired.

For the proof of (1.4) we use a time-reversing argument. Introduce the function \(f_{-}\) defined by \(f_{-}(t):=f(-t)\), \(t \in \mathbb {R}\). One easily verifies that \(f_{-}\) is convex and that \(\tau (f)=-\sigma (f_{-})\). We obtain by (1.3):

$$\begin{aligned} \tau (f) \geqslant x \;\; \Longleftrightarrow \;\; {}-\sigma (f_{-}) \geqslant x \;\; \Longleftrightarrow \;\; \sigma (f_{-}) \leqslant {}-x \;\; \Longleftrightarrow \;\; D^+f_{-}(-x) \geqslant 0 \end{aligned}$$

and the assertion follows upon noticing that \(D^+f_{-}(-x)=-D^-f(x)\).\(\square \)

Remark 1.2

Under the existence of the minimizers one has that:

$$\begin{aligned} \mathrm{(a)} \;\; \sigma (f)\leqslant x,\; \tau (f) \geqslant x \quad \Longleftrightarrow \quad \mathrm{(b)} \;\; D^+f(x) \geqslant 0,\; D^-f(x) \leqslant 0. \end{aligned}$$
(1.8)

Indeed, recall that \(D^+f\) and \(D^-f\) are non-decreasing, which by (1.2) shows necessity of (b). By another application of (1.2) the point x in (b) is a minimizer, \(x \in A(f)\), and (a) follows using the minimum and maximum property of \(\sigma (f)=\min A(f)\) and \(\tau (f)=\max A(f)\), respectively.

Note that we cannot readily infer Proposition 1.1 from (1.8). For example, if we only know that \(D^+f(x) \geqslant 0\) holds, then x need not be a minimizing point in general, and is not, as can be seen from simple examples as for instance \(f(x)=x^2\) Consequently, the argument via the minimum property of \(\sigma (f)\) fails.

For every non-decreasing function \(F:\mathbb {R} \rightarrow \mathbb {R}\) we introduce the generalized inverses:

$$\begin{aligned} F^\wedge (y):=\inf \hspace{1.111pt}\{x \in \mathbb {R}\,{:}\, F(x) \geqslant y \} \; \ \text {and} \; \ F^\vee (y):=\sup \hspace{1.111pt}\{x \in \mathbb {R}\,{:}\, F(x) \leqslant y \}, \; \;y \in \mathbb {R}. \end{aligned}$$

For properties of these inverses, confer Embrechts and Hofert [6], Feng et al. [7] or Fortelle [4]. Notice that (1.3) is the same as \([\sigma (f),\infty )= \{x \in \mathbb {R}\,{:}\, D^+f(x) \geqslant 0 \}\) and hence

$$\begin{aligned} \sigma (f)= (D^+f)^\wedge (0). \end{aligned}$$

Similarly, \((-\infty ,\tau (f)]=\{x \in \mathbb {R}\,{:}\, D^-f(x) \leqslant 0 \}\) by (1.4), whence

$$\begin{aligned} \tau (f)= (D^-f)^\vee (0). \end{aligned}$$

2 Measurability, semi-continuity and continuity of the argmin-functionals

Recall that C is the class of all convex functions on \(\mathbb {R}\). For each \(t \in \mathbb {R}\) let \(\pi _t:C \rightarrow \mathbb {R}\) denote the projection (evaluation map) at t, that is \(\pi _t(f)=f(t)\). We endow the function space C with the \(\sigma \)-algebra \(\mathscr {C}\) generated by the projections: \(\mathscr {C}:=\sigma (\pi _t\,{:}\, t \in \mathbb {R})\). Furthermore, C is equipped with the topology \(\mathscr {T}\) of pointwise convergence, which is known to be generated by the projections: \(\mathscr {T}:=\tau (\pi _t\,{:}\, t \in \mathbb {R})\). Recall that \(\mathscr {T}\) is the smallest topology on C for which all projections are continuous. Note that the trace \(\mathscr {C}_S:=S \cap \mathscr {C}\) in S is generated by the restrictions of \(\pi _t\) to S. Analogously, the subspace topology \(\mathscr {T}_S\! = S \cap \mathscr {T}\) on S is generated by these restrictions. The corresponding statements hold for \(S^\prime \) endowed with the trace \(\mathscr {C}_{S'}\!=S' \!\cap \mathscr {C}\) and the subspace topology \(\mathscr {T}_{S'}\!=S'\! \cap \mathscr {T}\).

Proposition 2.1

  1. (i)

    \(\sigma :S \rightarrow \mathbb {R}\) is \(\mathscr {C}_S\)-Borel measurable and \(\mathscr {T}_S\)-lower semi-continuous.

  2. (ii)

    \(\tau :S' \rightarrow \mathbb {R}\) is \(\mathscr {C}_{S'}\)-Borel measurable and \(\mathscr {T}_{S'}\)-upper semi-continuous.

Proof

For each \(x \in \mathbb {R}\) we have that:

$$\begin{aligned} \begin{aligned}&\{f \in S\,{:}\, \sigma (f) \leqslant x \}{} & {} \\&\qquad =\{f \in S\,{:}\, D^+f(x) \geqslant 0\}{} & {} \text {by Theorem }1.1\\&\qquad =\biggl \{f \in S\,{:}\, \inf _{x<t}\frac{f(t)-f(x)}{t-x} \geqslant 0\biggr \}{} & {} \text {by the equality in (}1.5\text {)}\\&\qquad =\biggl \{f \in S\,{:}\, \inf _{x<t \in \mathbb {Q}}\frac{f(t)-f(x)}{t-x} \geqslant 0\biggr \}{} & {} \text {by continuity of } f\\&\qquad = \bigcap _{x<t\in \mathbb {Q}} \biggl \{f \in S\,{:}\, \frac{f(t)-f(x)}{t-x} \geqslant 0\biggr \}{} & {} \\&\qquad = \bigcap _{x<t\in \mathbb {Q}} \{f \in S\,{:}\, f(t)-f(x) \geqslant 0\}{} & {} \\&\qquad = \bigcap _{x<t\in \mathbb {Q}} \bigl (S \cap (\pi _t-\pi _x)^{-1}([0,\infty ))\bigr ){} & {} \\&\qquad = S \cap \bigcap _{x<t\in \mathbb {Q}}\! (\pi _t-\pi _x)^{-1}([0,\infty )).{} & {} \end{aligned} \end{aligned}$$

By construction of \(\mathscr {C}\) every projection is \(\mathscr {C}\)-measurable, whence the differences \(\pi _t-\pi _x\) are \(\mathscr {C}\)-measurable as well and therefore \((\pi _t-\pi _x)^{-1}([0,\infty )) \in \mathscr {C}\) for all rationals \(t>x\). Since \(\mathscr {C}\) is closed under denumerable intersections we arrive at \(\{f \in S\,{:}\sigma (f) \leqslant x \} \in S \cap \mathscr {C} = \mathscr {C}_S\) for all \(x \in \mathbb {R}\), which by Lemma 1.4 in Kallenberg [16] shows measurability of \(\sigma \).

As to semi-continuity recall that by construction of \(\mathscr {T}\) every projection \(\pi _t\) is \(\mathscr {T}\)-continuous, whence the differences \(\pi _t-\pi _x\) are \(\mathscr {T}\)-continuous as well and therefore \((\pi _t-\pi _x)^{-1}([0,\infty ))\) are \(\mathscr {T}\)-closed for all \(t>x\). Since \(\mathscr {T}\) is closed under every kind of intersections we arrive at \(\{f \in S\,{:}\, \sigma (f) \leqslant x \}\) is \(\mathscr {T}_S\)-closed for all \(x \in \mathbb {R}\), which shows \(\mathscr {T}_S\)-semicontinuity of \({\sigma }\).

The second part follows in the same way. Indeed, since

$$\begin{aligned} D^-f(x)=\sup _{t<x}\frac{f(t)-f(x)}{t-x} \end{aligned}$$

it follows analogously that

$$\begin{aligned} \{f \in S'\,{:}\, \tau (f) \geqslant x \} = S'\! \cap \! \bigcap _{x<t\in \mathbb {Q}}\! (\pi _t-\pi _x)^{-1}((\infty ,0]). \end{aligned}$$

\(\square \)

Next we give further equivalent characterizations of semi-continuity. The first one is an immediate consequence of Proposition 2.1 and the definition of continuity (pre-images of open sets are open).

Corollary 2.2

Let \(\mathscr {O}_<:=\{(-\infty ,x)\,{:}\, x \in \mathbb {R}\} \cup \{\varnothing , \mathbb {R}\}\) and \(\mathscr {O}_>:=\{(x,\infty )\,{:}\, x \in \mathbb {R}\}\cup \{\varnothing , \mathbb {R}\}\) be the left-order topology and the right-order topology. Then:

  1. (1)

    \(\sigma :(S,\mathscr {T}_S) \rightarrow (\mathbb {R}, \mathscr {O}_>)\) is continuous.

  2. (2)

    \(\tau :(S'\!,\mathscr {T}_{S'}) \rightarrow (\mathbb {R}, \mathscr {O}_<)\) is continuous.

Sometimes it is advantageous to consider the restrictions of \(\sigma \) and \(\tau \) on subspaces.

Remark 2.3

Let \(\varnothing \ne U \subseteq S\) be endowed with \(\mathscr {C}_U:=U \cap \mathscr {C}\) and \(\mathscr {T}_U:=U \cap \mathscr {T}\). Then \(\sigma :U \rightarrow \mathbb {R}\) is \(\mathscr {C}_U\)-Borel measurable and \(\sigma :(U,\mathscr {T}_U) \rightarrow (\mathbb {R}, \mathscr {O}_>)\) is continuous. Similarly, if \(U \subseteq S'\!\), then \(\tau :U \rightarrow \mathbb {R}\) is \(\mathscr {C}_U\)-Borel measurable and \(\tau :(U,\mathscr {T}_U) \rightarrow (\mathbb {R}, \mathscr {O}_<)\) is continuous.

Corollary 2.2 in turn yields a second equivalent description of semi-continuity via net-convergence. For this purpose, let \((I,\leqslant )\) be here and in the following a directed set. Also recall the definition

$$\begin{aligned} S_{\textrm{u}} :=\bigl \{f \in C\,{:}\, f \text { has a unique minimizing point}\bigr \}. \end{aligned}$$
(2.1)

Corollary 2.4

Assume that \((f_\alpha )_{\alpha \in I} \subseteq C\) converges pointwise to f on \(\mathbb {R}\). Then the following statements apply:

  1. (1)

    If \((f_\alpha )_{\alpha \in I} \subseteq S\) and \(f \in S\), then \(\liminf _{\alpha } \sigma (f_\alpha ) \geqslant \sigma (f)\).

  2. (2)

    If \((f_\alpha )_{\alpha \in I} \subseteq S'\) and \(f \in S'\!\), then \(\limsup _{\alpha } \tau (f_\alpha ) \leqslant \tau (f)\).

  3. (3)

    If \((f_\alpha )_{\alpha \in I} \subseteq S \cap S'\) and \(f \in S_{\textrm{u}}\), then \(\lim _{\alpha } \sigma (f_\alpha )= \sigma (f)\) and \(\lim _{\alpha } \tau (f_\alpha )= \tau (f)\). Note that \(\sigma (f)=\tau (f)\), because \(f \in S_{\textrm{u}}\). Thus the smallest minimizer and the largest minimizer converge to the same limit.

Proof

By assumption, \(f_\alpha \rightarrow f\) in \((C,\mathscr {T})\), which by the requirement in (1) is the same as \(f_\alpha \rightarrow f\) in \((S,\mathscr {T}_S)\). According to Corollary 2.2, \(\sigma :(S,\mathscr {T}_S) \rightarrow (\mathbb {R}, \mathscr {O}_>)\) is continuous at every point \(f \in S\). Consequently \(\sigma (f_\alpha ) \rightarrow \sigma (f)\) in \((\mathbb {R}, \mathscr {O}_>)\). Now, a net \((y_\alpha )\) converges in \((\mathbb {R},\mathscr {O}_>)\) to y if and only if \(\liminf _{\alpha } y_\alpha \geqslant y\), which gives (1). In the same way one obtains (2) upon noticing that \(y_\alpha \rightarrow y\) in \((\mathbb {R},\mathscr {O}_<)\) if and only if \(\limsup _{\alpha } y_\alpha \leqslant y\). Finally, (3) follows from (1) and (2), because \(\sigma \leqslant \tau \) and therefore

$$\begin{aligned} \sigma (f) \leqslant \liminf _{\alpha } \sigma (f_\alpha ) \leqslant \limsup _{\alpha } \sigma (f_\alpha ) \leqslant \limsup _{\alpha } \tau (f_\alpha ) \leqslant \tau (f) = \sigma (f). \end{aligned}$$

This shows that \(\sigma (f_\alpha ) \rightarrow \sigma (f)\). Similarly

$$\begin{aligned} \sigma (f) \leqslant \liminf _{\alpha } \sigma (f_\alpha ) \leqslant \liminf _{\alpha } \tau (f_\alpha ) \leqslant \limsup _{\alpha } \tau (f_\alpha ) \leqslant \tau (f) = \sigma (f) \end{aligned}$$

resulting in \(\tau (f_\alpha ) \rightarrow \tau (f)\).\(\square \)

Semi-continuity of \(\sigma \) and \(\tau \) as stated in Proposition 2.1 and its reformulations in Corollaries 2.2 and 2.4 turn out to be a very strong tool for proving so-called Argmin theorems in probability and statistics.

Occasionally it is stated in the literature that \(\sigma \) or \(\tau \) are actually continuous with respect to the natural topology \(\mathscr {O}_n\) on \(\mathbb {R}\). But the following example shows that this is not true.

Example 2.5

Consider

$$\begin{aligned} f(t) = {\left\{ \begin{array}{ll} 0, &{}\quad |t| \leqslant 1,\\ |t|-1, &{}\quad |t|>1 \end{array}\right. } \end{aligned}$$

and for every \(n \in \mathbb {N}\) let

$$\begin{aligned} f_n(t) = {\left\{ \begin{array}{ll} f(t), &{}\quad t<0 \text { or } t>1+{1}/{n},\\ t/({n+1}), &{}\quad t \in [0,1+{1}/{n}]. \end{array}\right. } \end{aligned}$$

Obviously, f and \(f_n\), \(n \in \mathbb {N}\), are convex and \(f_n\) converges at every point (actually uniformly on \(\mathbb {R}\)) to f. However, \(\tau (f_n)=0\) for all \(n \in \mathbb {N}\), whereas \(\tau (f)=1\) and consequently \(\tau (f_n) \not \rightarrow \tau (f)\). Thus \(\tau \) is not continuous at f and from \(\sigma (f)=-\tau (f_{-})\) we infer that \(\sigma \) is not continuous at \(f_{-}\).

Note that the limit function in our example has no unique minimizing point. So let us consider the family \(S_{\textrm{u}}\) in (2.1) of all functions f with a unique minimizer. Clearly it holds that \(S_{\textrm{u}} \subseteq S \cap S'\) and that the functionals \(\sigma \) and \(\tau \) coincide on \(S_{\textrm{u}}\). Therefore, from Remark 2.3 we can infer that \(\sigma \) is lower- and upper-semicontinuous on the subspace \((S_{\textrm{u}},\mathscr {T}_{S_{\textrm{u}}})\), whence \(\sigma \) is continuous on \((S_{\textrm{u}},\mathscr {T}_{S_{\textrm{u}}})\) with respect to the natural topology \(\mathscr {O}_n\) on \(\mathbb {R}\). We note this in the following

Corollary 2.6

\(\sigma = \tau \) on \(S_{\textrm{u}}\) and

$$\begin{aligned} \sigma :(S_{\textrm{u}},\mathscr {T}_{S_{\textrm{u}}}) \rightarrow (\mathbb {R},\mathscr {O}_n) \end{aligned}$$

is continuous.

Let \(S^*:=S \cap S'\!=\{f\,{:}\,\mathbb {R}\rightarrow \mathbb {R}; f \text { convex with }A(f) \text { is a compact interval}\}\) and let \(\xi :S^* \rightarrow \mathbb {R}\) be any measurable selection of A, i.e., \(\xi (f) \in A(f)\) for every \(f \in S^*\) and measurability refers to the trace \(\mathscr {C}_{S^*}\!=S^*\! \cap \mathscr {C}\). Assume that \((f_\alpha ) \subseteq S^*\) converges pointwise to \(f\in S^*\) Since \(\sigma (f_\alpha ) \leqslant \xi (f_\alpha ) \leqslant \tau (f_\alpha )\) for all \(\alpha \in I\), it follows from the above Corollary 2.4 (and the characterization of net-convergence in the order topologies, confer the proof of Corollary 2.4) that

$$\begin{aligned} \xi (f_\alpha ) \rightarrow \sigma (f) \;\text { in } \; (\mathbb {R}, \mathscr {O}_>) \quad \text {and} \quad \xi (f_\alpha ) \rightarrow \tau (f) \;\text { in } \; (\mathbb {R},\mathscr {O}_<). \end{aligned}$$

If \(f \in S_{\textrm{u}}\), then \(\sigma (f)=\tau (f)\), whence \(\xi (f_\alpha ) \rightarrow \sigma (f)\) in \((\mathbb {R}, \mathscr {O}_>)\) and in \((\mathbb {R}, \mathscr {O}_<)\), which, as we know, is the same as \(\xi (f_\alpha ) \rightarrow \sigma (f)= \tau (f)\) in the natural topology \(\mathscr {O}_n\). In particular, every measurable selection of A is continuous on the subspace \(S_{\textrm{u}}\) with limit \(\sigma = \tau \). We see here, and will see it another time later, that the class \(S_{\textrm{u}}\) of convex functions with unique minimizing point plays a special role.

Lemma 2.7

\(S_{\textrm{u}} \in \mathscr {C}_{S^*}\).

Proof

Remark 2.3 says that \(\sigma :(S^*\!, \mathscr {C}_{S^*}) \rightarrow \mathbb {R}\) and \(\tau :(S^*\!, \mathscr {C}_{S^*}) \rightarrow \mathbb {R}\) are Borel measurable. Infer from \(S_{\textrm{u}} \subseteq S^*\) that \(S_{\textrm{u}}=\{f \in S^*\,{:}\, \sigma (f)=\tau (f)\}=(\sigma -\tau )^{-1}(\{0\}) \in \mathscr {C}_{S^*}\).\(\square \)

In addition to the topology \(\mathscr {T}\) of pointwise convergence, let C also be endowed with the topology \(\mathscr {T}_{\textrm{uc}}\) of uniform convergence on compacta. It is well known that \(\mathscr {T} \subseteq \mathscr {T}_{\textrm{uc}}\), because uniform convergence on compacta implies pointwise convergence. From Theorem 10.8 of Rockafellar [21] we know that on C the reverse is true. Notice that this is valid only for sequences. Thus the identity \(i:(C,\mathscr {T}) \rightarrow (C,\mathscr {T}_{\textrm{uc}})\) is sequentially continuous at every \(f \in C\). Unfortunately, in general topological spaces sequential continuity does not imply continuity, which in turn would give \(\mathscr {T}_{\textrm{uc}} = i^{-1}(\mathscr {T}_{\textrm{uc}}) \subseteq \mathscr {T}\) as desired. In fact the implication is true, if the space is first countable, confer Theorem 7.1.3 in Singh [22]. At this stage, however, we do not know whether first countability holds for \((C,\mathscr {T})\). So, we will prove continuity of \(i:(C,\mathscr {T}) \rightarrow (C,\mathscr {T}_{\textrm{uc}})\) traditionally via net-convergence, confer [22, Theorem 4.2.6]. Theorem 2.9 below on net-convergence is not only the key to success, but above all interesting in itself when compared with the sequential convergence occurring in [21, Theorem 10.8]. The proof of Theorem 2.9 is based on the following inequality.

Lemma 2.8

Let D be a dense subset of \(\hspace{1.111pt}\mathbb {R}\). Then for every compact set \(K \subseteq \mathbb {R}\) there exist a constant C and points \(d_1,\ldots ,d_8 \in D\) such that for each convex function \(f:\mathbb {R} \rightarrow \mathbb {R}\) it follows:

$$\begin{aligned} |f(s)-f(t)| \leqslant C \sum _{i=1}^8 |f(d_i)| |s-t| \quad \text {for all}\; \; s,t \in K. \end{aligned}$$
(2.2)

Proof

First, find points a and b from D such that \(K \subseteq [a,b]\). By Theorem 1.3.7 in Niculescu and Persson [20] we have that

$$\begin{aligned} |f(s)-f(t)| \leqslant L |s-t| \quad \text {for all}\; \; s,t \in [a,b], \end{aligned}$$

where \(L= \max \hspace{1.111pt}\{|D^+f(a)|,|D^-f(b)|\}\). Now,

$$\begin{aligned} D^+f(a)=\inf _{x>a}\frac{f(x)-f(a)}{x-a} \leqslant \frac{f(x)-f(a)}{x-a} \leqslant (x-a)^{-1}(|f(x)|+|f(a)|) \end{aligned}$$
(2.3)

for all \(x>a\) and since \(D^+f(a) \geqslant D^-f(a)= \sup _{y<a}({f(y)-f(a)})/({y-a})\) it follows that

$$\begin{aligned} D^+f(a) \geqslant \frac{f(y)-f(a)}{y-a} \geqslant (y-a)^{-1}(|f(y)|+|f(a)|) \quad \text {for all}\; \; y<a. \end{aligned}$$
(2.4)

Next, in (2.3) and in (2.4) we can choose the points x and y from D. Put \(C_1:=\max \hspace{1.111pt}\{(x-a)^{-1}\!,(a-y)^{-1}\} \in (0,\infty )\). Then \(|D^+f(a)|\leqslant C_1 (|f(x)|+|f(a)|+|f(y)|+|f(a)|)\). Similarly one obtains: \(|D^-f(b)|\leqslant C_2 (|f(u)|+|f(b)|+|f(v)|+|f(b)|)\) with positive constant \(C_2\) and points \(u \in D\), \(u<b\) and \(v \in D\), \(v >b\). Finally, if we put \(C:=\max \hspace{0.55542pt}\{C_1,C_2\}\), then \(L \leqslant C (|f(x)|+|f(a)|+|f(y)|+|f(a)|+|f(u)|+|f(b)|+|f(v)|+|f(b)|)\), which shows (2.2).\(\square \)

To state our next result recall that \((I,\leqslant )\) is a directed set.

Theorem 2.9

Let D be dense in \(\mathbb {R}\) and let \((f_\alpha )_{\alpha \in I}\) be a net in C, which converges pointwise on D to a function f, that is \(f_\alpha (t) \rightarrow f(t)\) for all \(t \in D\). Then f is convex and \((f_\alpha )\) converges uniformly to f on every compact subset of \(\mathbb {R}\).

Proof

We first prove uniform convergence. For that purpose, let without loss of generality K be a compact interval. By Lemma 2.8 there are a constant C and \(d_1,\ldots ,d_8 \in D\) such that

$$\begin{aligned} |f_\alpha (s)-f_\alpha (t)| \leqslant C \sum _{i=1}^8 |f_\alpha (d_i)| |s-t| \quad \text {for all}\; \; s,t \in K \;\;\text {and}\; \; \alpha \in I. \end{aligned}$$
(2.5)

By pointwise convergence we find for every \(1 \leqslant i \leqslant 8\) an index \(\alpha _i \in I\) and a constant \(c_i \in \mathbb {R}\) such that \(|f_\alpha (d_i)| \leqslant c_i\) for all \(\alpha \geqslant \alpha _i\). To \(\alpha _1,\ldots ,\alpha _8\) there exist a dominating index \(\alpha ^* \!\in I\) with \(\alpha ^* \geqslant \alpha _1,\ldots ,\alpha _8\). It follows that \(C \sum _{i=1}^8 |f_\alpha (d_i)| \leqslant C \sum _{i=1}^8 c_i =: L \in [0,\infty )\) for all \(\alpha \geqslant \alpha ^*\), whence by (2.5) the family \(\mathscr {F}:=\{f_\alpha \,{:}\, \alpha ^* \leqslant \alpha \in I\}\) is equicontinuous (on K). Furthermore,

$$\begin{aligned} |f_\alpha (t)|\leqslant |f_\alpha (t)-f_\alpha (d_1)|+|f_\alpha (d_1)|\leqslant L|t-d_1|+c_1\quad \text {for all}\;\; \alpha \geqslant \alpha ^*\!\geqslant \alpha _1, \end{aligned}$$

whence \(\mathscr {F}\) is pointwise bounded. Thus by the Arzelà–Ascoli theorem, confer, e.g., Heuser [13], the family \(\mathscr {F}\) is sequentially compact and hence (as a metric space) compact. Therefore, if \((f_{\alpha '})\) is a subnet of \((f_\alpha )_{\alpha ^* \leqslant \alpha \in I}\), then there exists a further subnet \((f_{\alpha ''})\) of \((f_{\alpha '})\), which converges to a function g uniformly on K. In particular, \(f_{\alpha ''}(t) \rightarrow g(t)\) for all \(t \in K\). But by the assumption of pointwise convergence we also know that \(f_{\alpha ''}(t) \rightarrow f(t)\) for all \(t \in K \cap D\). Thus by denseness and continuity \(g=f\) on K and by the subnet-criterion it follows that \((f_\alpha )_{\alpha ^* \leqslant \alpha \in I}\) converges to f uniformly on K, which a fortiori holds for the whole net \((f_\alpha )_{\alpha \in I}\).

In particular, we now know that \((f_\alpha )_{\alpha \in I}\) converges pointwise to f on the entire real line. Thus convexity of f follows from the convexity of \(f_\alpha \) by taking the limit.\(\square \)

Infer from the above Theorem 2.9 that if a net \(f_\alpha \rightarrow f\) in \((C,\mathscr {T})\), then \(f_\alpha \rightarrow f\) in \((C,\mathscr {T}_{\textrm{uc}})\). Obviously, the reverse is true as well. As a consequence we obtain

Corollary 2.10

The topology of pointwise convergence and the topology of uniform convergence on compacta coincide on C:

$$\begin{aligned} \mathscr {T} = \mathscr {T}_{\textrm{uc}}. \end{aligned}$$

Remark 2.11

Let \(\mathscr {T}(D)\) be the topology of pointwise convergence on D. It is generated by the projections \(\pi _t, t \in D\). If D is dense in \(\mathbb {R}\), then Theorem 2.9 actually yields that \(\mathscr {T}(D)=\mathscr {T}_{\textrm{uc}}=\mathscr {T}\). So all the topologies match.

The observation in Remark 2.11 leads to the following variant of the semi-continuity.

Corollary 2.12

Let D be dense in \(\mathbb {R}\). If in Corollary 2.4 the assumption is replaced by \((f_\alpha )_{\alpha \in I} \subseteq C\) converges pointwise to f on \({\textbf {D}}\), then all statements of Corollary 2.4 remain valid.

Let \(D=\{t_i\,{:}\, i \in \mathbb {N}\}\) be a countable and dense subset of \(\mathbb {R}\). Introduce the special projection map \(H:C \rightarrow \mathbb {R}^\mathbb {N}\) by \(H(f):=(f(t_i))_{i \in \mathbb {N}}\). Note that H depends on D, but we suppress this in our notation. Equip \(\mathbb {R}^\mathbb {N}\) with the product topology \(\Pi \). Denote the range H(C) by R and the relative topology \(R \cap \Pi \) by \(\mathscr {R}\). With the following result we will prove a functional limit theorem for convex stochastic processes.

Lemma 2.13

The map H is a bijection onto its range and its inverse \(H^{-1}:(R,\mathscr {R}) \rightarrow (C,\mathscr {T})\) is continuous.

Proof

If \(H(f)=H(g)\), then \(f=g\) on D and by continuity and denseness of D the equality holds on the entire real line. Thus H is injective, and it is surjective by construction.

As to continuity of the inverse consider a sequence \((r_n)\) with

$$\begin{aligned} r_n \rightarrow r \quad \text {in} \;\; (R,\mathscr {R}). \end{aligned}$$
(2.6)

Since \((r_n) \subseteq R\), we find to each \(n \in \mathbb {N}\) a function \(f_n \in C\) such that \(r_n = H(f_n)=(f_n(t_i))_{i \in \mathbb {N}}\). For the same reason there is some \(f \in C\) with \(r=H(f)=(f(t_i))_{i \in \mathbb {N}}\). Recall that convergence in \(\Pi \) or in \(\mathscr {R}\), respectively, is the same as coordinate-wise convergence. Thus the convergence in (2.6) means that \(f_n(t_i) \rightarrow f(t_i)\) for all \(i \in \mathbb {N}\). Since D lies dense in \(\mathbb {R}\) we can apply [21, Theorem 10.8] (or our Theorem 2.9) to infer that actually \(f_n(t) \rightarrow f(t)\) for every \(t \in \mathbb {R}\). Now, by definition \(f_n=H^{-1}(r_n)\) and \(f=H^{-1}(r)\), whence we arrive at \(H^{-1}(r_n) \rightarrow H^{-1}(r)\) in \((C,\mathscr {T})\). Consequently, \(H^{-1}\) is continuous.\(\square \)

3 Applications in probability and statistics

Let \((\Omega ,\mathscr {A})\) be a measurable space. For a map \(Z:\Omega \rightarrow C\) we write \(Z(\omega ,t):=Z(\omega )(t)\) for the value of the function \(Z(\omega ):\mathbb {R} \rightarrow \mathbb {R}\) (trajectory) at point \(t \in \mathbb {R}\). Very often it is more convenient to write Z(t) instead of \(Z(\omega ,t)\) for this ambiguity in the notation explains in the context. Let \(\mathscr {B}:=\mathscr {B}(\mathbb {R})\) denote the Borel-\(\sigma \) algebra on \(\mathbb {R}\). If \(Z(\hspace{1.111pt}{\cdot }\hspace{1.111pt},t):\Omega \rightarrow \mathbb {R}\) is \(\mathscr {A}\text {-}\mathscr {B}\) measurable for each \(t \in \mathbb {R}\), then Z is called a convex stochastic process. This is the same as saying that Z(t) is a real random variable for all \(t \in \mathbb {R}\). If \(Z(\omega ) \in U\) for all \(\omega \in \Omega \), where U is a subset of C, we say that Z is a convex stochastic process in U or for short a process in U. In other words, all trajectories of Z are U-valued.

Let \(\mathscr {B}(C):=\sigma (\mathscr {T})\) be the Borel-\(\sigma \) algebra pertaining to the topology of pointwise convergence. By Corollary 2.10 it coincides with \(\mathscr {B}_{\textrm{uc}}(C):=\sigma (\mathscr {T}_{\textrm{uc}})\). The following result yields a convenient characterization of the Borel-\(\sigma \) algebra. Recall that by definition \(\mathscr {C}=\sigma (\pi _t\,{:}\, t \in \mathbb {R})\).

Proposition 3.1

$$\begin{aligned} \mathscr {B}(C)=\mathscr {B}_{\textrm{uc}}(C)=\mathscr {C}. \end{aligned}$$

Proof

Let \(C^*:=\{f\,{:}\,\mathbb {R} \rightarrow \mathbb {R}; f \text { continuous}\}\) be endowed with the topology \(\mathscr {T}_{\textrm{uc}}^*\) of uniform convergence on compacta. One verifies easily that \(\mathscr {T}_{\textrm{uc}}= C \cap \mathscr {T}_{\textrm{uc}}^*\). If \(i:C \rightarrow C^*\) is the natural injection into \(C^*\), i.e., \(i(f)=f\), then \(\mathscr {T}_{\textrm{uc}}=i^{-1}(\mathscr {T}_{\textrm{uc}}^*)\), whence

$$\begin{aligned} \sigma (\mathscr {T}_{\textrm{uc}})= \sigma (i^{-1}(\mathscr {T}_{\textrm{uc}}^*))=i^{-1}(\sigma (\mathscr {T}_{\textrm{uc}}^*))=C \cap \sigma (\mathscr {T}_{\textrm{uc}}^*), \end{aligned}$$

where the second equality is ensured by Lemma 1.2.5 in Gänssler and Stute [11]. By Lemma A5.1 in Kallenberg [17] we have that \(\sigma (\mathscr {T}_{\textrm{uc}}^*) = \sigma (\pi _t^*\,{:}\, t \in \mathbb {R})\), where \(\pi _t^*:C^* \rightarrow \mathbb {R}\) is the projection on \(C^*\) But \(C \cap \sigma (\pi _t^*\,{:}\, t \in \mathbb {R})=\sigma (\pi _t\,{:}\, t \in \mathbb {R})\), which gives the desired result.\(\square \)

If Z is a process in \(U \subseteq C\) it can be regarded as a map \(Z:\Omega \rightarrow U\) with Borel-\(\sigma \) algebra \(\mathscr {B}(U)=\sigma (\mathscr {T}_U)=U \cap \mathscr {B}(C)= U \cap \mathscr {C} = \mathscr {C}_U\) by Proposition 3.1.

Lemma 3.2

Assume that Z is a convex stochastic process. Then Z is \(\mathscr {A}\text {-}\mathscr {B}(C)\) measurable. If actually Z is a process in \(U \subseteq C\), then Z is \(\mathscr {A}\text {-}\mathscr {B}(U)\) measurable.

Proof

The first assertion follows from Proposition 3.1, which enables us to apply Proposition 1.2.11 in Gänssler and Stute [11]. The second assertion follows from \(\mathscr {B}(U)=U \cap \mathscr {B}(C)\) in combination with the first assertion.\(\square \)

Recall that \(S^*\!=S \cap S'\) and \(\xi :(S^*\!,\mathscr {C}_{S^*}) \rightarrow (\mathbb {R},\mathscr {B})\) denotes any measurable selection of A.

Corollary 3.3

If Z is a convex stochastic process in S, in \(S'\) or in \(S^*\!\), then \(\sigma (Z)\), \(\tau (Z)\) or \(\xi (Z)\), respectively, are real random variables.

Proof

Lemma 3.2 says that \(Z:(\Omega ,\mathscr {A}) \rightarrow (S,\mathscr {B}(S))\) is measurable. But \(\mathscr {B}(S)=S \cap \mathscr {B}(C)=S \cap \mathscr {C}= \mathscr {C}_S\), where the second equality holds by Proposition 3.1. From Proposition 2.1 we know that \(\sigma :(S,\mathscr {C}_S) \rightarrow (\mathbb {R},\mathscr {B})\) is measurable, whence \(\sigma (Z)=\sigma \hspace{1.111pt}{\circ }\hspace{1.111pt}Z\) is measurable as composition of measurable maps. Replacing S through \(S'\) or \(S^*\) gives measurability of \(\tau (Z)\) or \(\xi (Z)\), respectively.\(\square \)

The concept of convergence in distribution is well known for random variables with values in a metric space. A classical reference here is the book of Billingsley [1]. In contrast, the extension of the concept from metric spaces to topological spaces seems less known. It goes back to Gänssler and Stute [11], who in turn modify the ideas of Topsøe [23]. Let Z and \(Z_\alpha \), \(\alpha \in I\), be random variables defined on a probability space \((\Omega , \mathscr {A}, \mathbb {P})\) with values in some topological space \((X,\mathscr {O})\), that is \(Z:\Omega \rightarrow X\) and \(Z_\alpha :\Omega \rightarrow X\) are \(\mathscr {A}\text {-}\mathscr {B}(X)\) measurable, where \(\mathscr {B}(X):=\sigma (\mathscr {O})\) denotes the Borel-\(\sigma \) algebra. Then the net \((Z_\alpha )_{\alpha \in I}\) converges in distribution to Z in \((X,\mathscr {O})\), if

$$\begin{aligned} \liminf _\alpha \mathbb {P}\hspace{0.55542pt}(Z_\alpha \,{\in }\, O) \geqslant \mathbb {P}\hspace{0.55542pt}(Z\,{ \in }\, O) \quad \text {for all}\; \; O \in \mathscr {O}. \end{aligned}$$
(3.1)

This is denoted by \(Z_\alpha {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} Z\) in \((X,\mathscr {O})\) and by complementation is equivalent to

$$\begin{aligned} \limsup _\alpha \mathbb {P}\hspace{0.55542pt}(Z_\alpha \,{\in }\, F) \leqslant \mathbb {P}\hspace{0.55542pt}(Z \,{\in }\, F) \quad \text {for all}\; \; F \in \mathscr {F}, \end{aligned}$$
(3.2)

where \(\mathscr {F}\) is the family of all closed sets in \((X,\mathscr {O})\).

The following result plays an important role in what follows. For this reason we like to state it here. The proof is comparatively simple and can be found in Gänssler and Stute [11], p. 345.

Theorem 3.4

(Continuous Mapping) Let \((X,\mathscr {O})\) and \((E,\mathscr {G})\) be topological spaces, \(h:X \rightarrow E\) be \(\mathscr {B}(X)\text {-}\mathscr {B}(E)\) measurable and \(D_h:=\{x \in X\,{:}\, h \text { is discontinuous at } x \}\). Suppose Z and \(Z_\alpha \), \(\alpha \in I\), are random variables over \((\Omega , \mathscr {A},\mathbb {P})\) with values in \((X,\mathscr {O})\), where \(\mathbb {P}^*(Z \in D_h)=0\) with \(\mathbb {P}^*\) the outer measure of \(\hspace{1.111pt}\mathbb {P}\). Then \(Z_\alpha {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} Z\) in \((X,\mathscr {O})\) entails \(h(Z_\alpha ) {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} h(Z)\) in \((E,\mathscr {G})\).

Let \((Y,\mathscr {O}_Y)\) with \(Y \subseteq X\) and \(\mathscr {O}_Y:=Y \cap \mathscr {O}\) be a subspace of \((X,\mathscr {O})\). Notice that \(\mathscr {B}(Y)= \sigma (\mathscr {O}_Y)= Y \cap \mathscr {B}(X)\). So, a map \(Z:\Omega \rightarrow X \) with range contained in Y is \(\mathscr {A}\text {-}\mathscr {B}(Y)\) measurable (considered as a map into Y) if and only if it is \(\mathscr {A}\text {-}\mathscr {B}(X)\) measurable. Suppose Z and \(Z_\alpha \) for all \(\alpha \in I\) are random variables with values in the subspace. Then \(Z_\alpha {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} Z\) in \((Y,\mathscr {O}_Y)\) is equivalent to \(Z_\alpha {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} Z\) in \((X,\mathscr {O})\). Indeed, the natural injection \(i:(Y,\mathscr {O}_Y) \rightarrow (X,\mathscr {O})\) given by \(i(x)=x\) is continuous. Thus the continuous mapping theorem (CMT) shows sufficiency. The necessity follows from (3.1) upon noticing that \(\{Z_\alpha \in Y \cap O\}=\{Z_\alpha \in O\}\) and \(\{Z \in Y \cap O\}=\{Z \in O\}\) for all \(O \in \mathscr {O}\). We call this equivalence the Subspace-lemma.

For further properties including the Portmanteau-Theorem we refer to Chapter 8.4 in Gänssler and Stute [11].

Recall the left- and right-order topologies \(\mathscr {O}_<\) and \(\mathscr {O}_>\), which are not metrizible. If a net \((x_\alpha )_{\alpha \in I}\) converges in \((\mathbb {R}, \mathscr {O}_<)\) and in \((\mathbb {R}, \mathscr {O}_>)\), then it converges in the natural topology \(\mathscr {O}_n\), and vice versa. The following example shows that there is a counterpart for distributional convergence.

Example 3.5

$$\begin{aligned}&Z_\alpha {\mathop {\rightarrow }\limits ^{\mathscr {D}}} Z \; \text { in } (\mathbb {R},\mathscr {O}_>) \ \ \Longleftrightarrow \ \ \liminf _\alpha \mathbb {P}\hspace{0.55542pt}(Z_\alpha \,{>}\,x) \geqslant \mathbb {P}\hspace{0.55542pt}(Z \,{>}\,x) \; \text { for all } x \in \mathbb {R}. \end{aligned}$$
(3.3)
$$\begin{aligned}&Z_\alpha {\mathop {\rightarrow }\limits ^{\mathscr {D}}} Z \; \text { in } (\mathbb {R},\mathscr {O}_<) \ \ \Longleftrightarrow \ \ \liminf _\alpha \mathbb {P}(Z_\alpha \,{<}\,x) \geqslant \mathbb {P}\hspace{0.55542pt}(Z\,{ <}\,x) \; \text { for all } x \in \mathbb {R}. \end{aligned}$$
(3.4)
$$\begin{aligned}&Z_\alpha {\mathop {\rightarrow }\limits ^{\mathscr {D}}} Z \; \text { in } (\mathbb {R},\mathscr {O}_>) \text { and in } (\mathbb {R},\mathscr {O}_<) \ \Longleftrightarrow \ Z_\alpha {\mathop {\rightarrow }\limits ^{\mathscr {D}}} Z \; \text { in } \; (\mathbb {R},\mathscr {O}_n). \end{aligned}$$
(3.5)

Here, (3.3) and (3.4) are immediate consequences of the definitions. In (3.5) the sufficiency of the right side follows from \(\mathscr {O}_n \supseteq \mathscr {O}_<\) and \(\mathscr {O}_n \supseteq \mathscr {O}_>\). To see necessity let \(x \in \mathbb {R}\). Then we obtain:

$$\begin{aligned} \mathbb {P}\hspace{0.55542pt}(Z<x)&\leqslant \liminf _\alpha \mathbb {P}\hspace{0.55542pt}(Z_\alpha \,{<}\,x) \quad \text { by } (3.4)\\&\leqslant \limsup _\alpha \mathbb {P}\hspace{0.55542pt}(Z_\alpha \,{<}\,x) \\&\leqslant \limsup _\alpha \mathbb {P}\hspace{0.55542pt}(Z_\alpha \,{\leqslant }\, x) \leqslant \mathbb {P}\hspace{0.55542pt}( Z \,{\leqslant }\, x) \quad \text { by } (3.3) \text { and complementation}. \end{aligned}$$

Thus, if the distribution function of Z is continuous at x, i.e., \(\mathbb {P}(Z\,{<}\,x)=\mathbb {P}(Z \,{\leqslant }\, x)\) it follows that \(\lim _{\alpha } \mathbb {P}\hspace{0.55542pt}(Z_\alpha \,{\leqslant }\, x) = \mathbb {P}\hspace{0.55542pt}( Z \,{\leqslant }\, x)\) as required.

Deduce from (3.3): If \(Z_\alpha \leqslant Z_\alpha ^*\) \(\mathbb {P}\)-almost surely (a.s.) for all \(\alpha \geqslant \alpha _0 \in I\) or if \(Z \geqslant Z^*\) a.s., then \(Z_\alpha {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} Z\) in \((\mathbb {R},\mathscr {O}_>)\) entails \(Z_\alpha ^* {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} Z^*\) in \((\mathbb {R},\mathscr {O}_>)\). (property 1)

Deduce from (3.4): If \(Z_\alpha \geqslant Z_\alpha ^*\) (a.s.) for all \(\alpha \geqslant \alpha _0 \in I\) or if \(Z \leqslant Z^*\) a.s., then \(Z_\alpha {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} Z\) in \( (\mathbb {R},\mathscr {O}_<)\) entails \(Z_\alpha ^* {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} Z^* \) in \( (\mathbb {R},\mathscr {O}_<)\). (property 2)

In particular, this shows that the limit variables are not unique.

Our next result characterizes distributional convergence for random variables with values in the function space \((C,\mathscr {T})\). Below the euclidian space \(\mathbb {R}^k\) is endowed with the product-topology \(\mathscr {O}_n^k\).

Proposition 3.6

(Functional limits) Let Z and \(Z_\alpha \), \(\alpha \in I\), be convex stochastic processes. Then they are random variables in \((C,\mathscr {T})\) and the following statements (3.6) and (3.7) are equivalent:

$$\begin{aligned}&Z_\alpha \xrightarrow { \ \mathscr {D} \ } Z \quad \text {in} \;\; (C,\mathscr {T}), \end{aligned}$$
(3.6)
$$\begin{aligned}&(Z_\alpha (t_1),\ldots ,Z_\alpha (t_k)) \xrightarrow { \ \mathscr {D} \ } (Z(t_1),\ldots ,Z(t_k)) \quad \text {in}\;\; (\mathbb {R}^k\!, \mathscr {O}_n^k) \end{aligned}$$
(3.7)

for every \(k \in \mathbb {N}\) and for each collection of points \(t_1,\ldots , t_k \in D\), where D is a countable and dense subset of \(\hspace{1.111pt}\mathbb {R}\).

Proof

The first assertion holds by Lemma 3.2. Assume (3.6) holds. By definition of \(\mathscr {T}\) every projection \(\pi _t:(C,\mathscr {T}) \rightarrow (\mathbb {R}, \mathscr {O}_n)\) is continuous, whence the product map \(\pi :=(\pi _{t_1},\ldots ,\pi _{t_k}):(C,\mathscr {T}) \rightarrow (\mathbb {R}^k\!, \mathscr {O}_n^k)\) is continuous as well. Since \((Z_\alpha (t_1),\ldots ,Z_\alpha (t_k))= \pi (Z_\alpha )\) and \((Z(t_1),\ldots ,Z(t_k))= \pi (Z)\), an application of the CMT yields (3.7).

For the converse first note that by countability we have that \(D=\{t_1,t_2,\ldots \}\). Recall the projection map H given in Lemma 2.13. It follows from Example 2.6 in Billingsley [2] that (3.7) entails that \(H(Z_\alpha ) {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} H(Z)\) in \((\mathbb {R}^\mathbb {N}\!, \Pi )\). Now, the Subspace-lemma says that \(H(Z_\alpha ) {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} H(Z)\) in \((R, \mathscr {R})\). By Lemma 2.13 the inverse \(H^{-1}:(R, \mathscr {R}) \rightarrow (C,\mathscr {T})\) is continuous, so that another application of the CMT yields (3.6).\(\square \)

The second statement (3.7) is known as convergence of the finite dimensional distributions (on D) and denoted by \(Z_\alpha {\mathop {\longrightarrow }\limits ^{\scriptscriptstyle \mathrm fd}}_D Z\) (in short: convergence of the fidis).

We are now in the position to formulate several so-called Argmin theorems for convex stochastic processes.

Theorem 3.7

Let D be countable and dense in \(\mathbb {R}\). Consider convex stochastic processes Z and \(Z_\alpha \), \(\alpha \in I\), in U. Suppose that \(Z_\alpha {\mathop {\longrightarrow }\limits ^{\scriptscriptstyle \mathrm fd}}_D Z\). Then the following statements hold:

  1. (1)

    If \(U=S\), then \(\sigma (Z_\alpha ) {\mathop {\longrightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} \sigma (Z)\) in \((\mathbb {R},\mathscr {O}_>)\). If in addition Z is a process in \(S^*\) with \(Z \in S_{\textrm{u}}\) a.s., then \(\sigma (Z_\alpha ) {\mathop {\longrightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} \sigma (Z)\) in \((\mathbb {R},\mathscr {O}_n)\).

  2. (2)

    If \(U=S'\!\), then \(\tau (Z_\alpha ) {\mathop {\longrightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} \tau (Z)\) in \((\mathbb {R},\mathscr {O}_<)\). If in addition Z is a process in \(S^*\) with \(Z \in S_{\textrm{u}}\) a.s., then \(\tau (Z_\alpha ) {\mathop {\longrightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} \tau (Z)\) in \((\mathbb {R},\mathscr {O}_n)\).

  3. (3)

    If \(U=S^*\!\), then \(\xi (Z_\alpha ) {\mathop {\longrightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} \sigma (Z)\) in \((\mathbb {R},\mathscr {O}_>)\) and \(\xi (Z_\alpha ) {\mathop {\longrightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} \tau (Z)\) in \((\mathbb {R},\mathscr {O}_<)\). If in addition \(\sigma (Z) {\mathop {=}\limits ^{\scriptscriptstyle \mathscr {D}}} \tau (Z)\), then \(\xi (Z_\alpha ) {\mathop {\longrightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} \sigma (Z)\) in \((\mathbb {R},\mathscr {O}_n)\).

Proof

First notice that by Corollary 3.3 all involved maps are real random variables. By Proposition 3.6, \(Z_\alpha {\mathop {\longrightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} Z\) in \((C,\mathscr {T})\), whence by the Subspace-lemma

$$\begin{aligned} Z_\alpha \xrightarrow { \ \mathscr {D} \ } Z\quad \text {in}\;\; (S,\mathscr {T}_S). \end{aligned}$$
(3.8)

From Corollary 2.2 we know that \(\sigma :(S,\mathscr {T}_S) \rightarrow (\mathbb {R},\mathscr {O}_>)\) is continuous and consequently the CMT yields the first convergence in (1).

By Lemma 2.7, \(S_{\textrm{u}} \in \mathscr {C}_{S^*}\). According to Lemma 3.2, Z is \(\mathscr {A}\text {-}\mathscr {C}_{S^*}\) measurable upon noticing that \(\mathscr {B}(S^*)= S^* \!\cap \mathscr {B}(C) = S^* \!\cap \mathscr {C}=\mathscr {C}_{S^*}\) by Proposition 3.1. Thus \(\{Z \in S_{\textrm{u}}\} \in \mathscr {A}\), whence by Corollary 2.6 it follows that \(0 \leqslant \mathbb {P}^*(Z \,{\in }\, D_\sigma ) \leqslant \mathbb {P}^*(Z \,{\notin }\, S_{\textrm{u}}) = \mathbb {P}\hspace{0.55542pt}(Z\,{ \notin }\, S_{\textrm{u}})=0\). Finally, by Propositions 2.1 and 3.1, \(\sigma :(S,\mathscr {T}_S) \rightarrow (\mathbb {R},\mathscr {O}_n)\) is \(\mathscr {B}(S)\text {-}\mathscr {B}(\mathbb {R})\) measurable. Consequently by (3.8) another application of the CMT gives the second convergence in (1).

The second part (2) follows exactly the same way. Regarding part (3) it should be noted that \(S^*\! \subseteq S\) and \(S^*\! \subseteq S'\) Therefore \(\sigma (Z_\alpha ) {\mathop {\longrightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} \sigma (Z)\) in \((\mathbb {R},\mathscr {O}_>)\) by (1) and \(\tau (Z_\alpha ) {\mathop {\longrightarrow }\limits ^{\scriptscriptstyle \mathscr {D}}} \tau (Z)\) in \((\mathbb {R},\mathscr {O}_<)\) by (2). This shows the first assertions in part (3), because \(\sigma (Z_\alpha ) \leqslant \xi (Z_\alpha ) \leqslant \tau (Z_\alpha )\) for all \(\alpha \in I\) and so we can use property 1 and property 2 from Example 3.5. Finally, the second assertion in (3) follows from (3.5) in Example 3.5.\(\square \)

Our Argmin theorems (1)–(3) involve two types of uniqueness assumptions. In (1) and (2) the requirement \(Z \in S_{\textrm{u}}\) a.s. is the same as \(\sigma (Z)=\tau (Z)\) a.s., which in turn implies \(\sigma (Z) {\mathop {=}\limits ^{\scriptscriptstyle \mathscr {D}}} \tau (Z)\) as in part (3). However, if \(\sigma (Z)\) and \(\tau (Z)\) are \(\mathbb {P}\)-integrable, then the reverse implication holds. Indeed, in this case by linearity \(\mathbb {E}\hspace{0.55542pt}[\tau (Z)-\sigma (Z)]=0\). But since the integrand is non-negative, it must be equal to zero a.s.

A comparison shows that on the one hand in (3) the uniqueness condition is weaker than in (1) or in (2), on the other hand in (3) the demands on the stochastic processes are more stringent.

In the following we give some interesting equivalent characterizations for almost sure uniqueness of the minimizing point.

Proposition 3.8

Suppose Z is a convex stochastic process in \(S^*\) Then the following three statements are equivalent:

  1. (1)

    \(\sigma (Z)=\tau (Z)\) a.s.

  2. (2)

    \(x \notin A(Z)\) a.s. for Lebesgue-almost every \(x \in \mathbb {R}\).

  3. (3)

    \(0 \notin [D^-Z(x),D^+Z(x)]\) a.s. for Lebesgue-almost every \(x \in \mathbb {R}\).

Proof

We write briefly \(\sigma =\sigma (Z)\) and \(\tau =\tau (Z)\). Moreover, let \(\lambda \) denote the Lebegue measure on \(\mathbb {R}\). Note that \(\tau -\sigma \geqslant 0\), whence by Corollary 3.3 the expectation exists and is equal to:

$$\begin{aligned} \begin{aligned} \mathbb {E}\hspace{0.55542pt}[\tau -\sigma ]&= \mathbb {E}\hspace{0.55542pt}[\lambda ([\sigma ,\tau ])] = \int _\Omega \int _\mathbb {R} 1_{\{\sigma \leqslant x, \tau \geqslant x\}}\, \lambda ({\textrm{d}}x)\hspace{1.111pt}\mathbb {P}({\textrm{d}}\omega ){} & {} \\&= \int _\Omega \int _\mathbb {R} 1_{\{D^-Z(x) \leqslant 0 \leqslant D^+Z(x)\}} \,\lambda ({\textrm{d}}x)\hspace{1.111pt}\mathbb {P}({\textrm{d}}\omega ){} & {} \text {by Theorem }1.1\\&=\int _\mathbb {R} \mathbb {P}(D^-Z(x) \leqslant 0 \leqslant D^+Z(x)) \,\lambda ({\textrm{d}}x){} & {} \text {by Fubini}\\&=\int _\mathbb {R} \mathbb {P}(x \,{\in }\,A(Z))\, \lambda ({\textrm{d}}x).{} & {} \text {by } (1.2) \end{aligned} \end{aligned}$$

Observe that all occurring integrands are non-negative. Thus the equivalence can be deduced from a well-known result from integration theory, confer, e.g., Lemma 1.15, p. 304 in Dshalalow [5]. \(\square \)

If Z in the above proposition is in addition differentiable, then \(\sigma (Z)=\tau (Z)\) a.s. if and only if \(Z'(x) \ne 0\) a.s. for \(\lambda \)- every \(x \in \mathbb {R}\).

Next, we establish an Argmin theorem for almost sure convergence. Just as in Theorem 3.7, there are also convergence statements here for the non-unique case. This extends results known so far, confer Theorem 7.77 in Liese and Mieschke [19], which in turn rely on the unpublished preprint of Hjort and Pollard [14].

Theorem 3.9

Let Z and \(Z_\alpha \), \(\alpha \in I\), be convex stochastic processes in U defined on a complete probability space \((\Omega ,\mathscr {A},\mathbb {P})\). Assume that \(Z_\alpha (t) \rightarrow Z(t)\) a.s. for every \(t \in D\) with D a countable and dense subset of \(\hspace{1.111pt}\mathbb {R}\).

  1. (1)

    If \(U=S\), then \(\liminf _{\alpha } \sigma (Z_\alpha ) \geqslant \sigma (Z)\) a.s.

  2. (2)

    If \(U=S'\!\), then \(\limsup _{\alpha } \tau (Z_\alpha ) \leqslant \tau (Z)\) a.s.

  3. (3)

    If \(U=S^*\) and \(Z \in S_{\textrm{u}}\) a.s., then \(\lim _{\alpha } \sigma (Z_\alpha )= \sigma (Z)=\tau (Z)\) a.s. and \(\lim _{\alpha } \tau (Z_\alpha )= \tau (Z)=\sigma (Z)\) a.s.

  4. (4)

    If \(U=S^*\!\), then \(\liminf _{\alpha } \xi (Z_\alpha ) \geqslant \sigma (Z)\) a.s. and \(\limsup _{\alpha } \xi (Z_\alpha ) \leqslant \tau (Z)\) a.s. If in addition \(Z \in S_{\textrm{u}}\) a.s., then \(\lim _{\alpha } \xi (Z_\alpha ) = \sigma (Z)= \tau (Z)\) a.s.

Proof

Put \(\Omega _t:=\{Z_\alpha (t) \,{\rightarrow }\, Z(t)\}\), \(t \in D\). Then \(\Omega _t \in \mathscr {A}\) and \(\mathbb {P}(\Omega _t)=1\) by assumption and by completeness. Thus \(\Omega _0 :=\bigcap _{t \in D} \Omega _t \in \mathscr {A}\) and \(\mathbb {P}(\Omega _0)=1\), because D is countable. Corollary 2.12 yields that \(\Omega _0 \subseteq \{\liminf _{\alpha } \sigma (Z_\alpha ) \geqslant \sigma (Z)\} =:\Omega _1\). It follows from completeness that \(\Omega _1 \in \mathscr {A}\), and by monotonicity of \(\mathbb {P}\) we arrive at \(\mathbb {P}(\Omega _1)=1\), which shows (1). In the same way one obtains (2). Furthermore, Corollary 2.12 ensures that \(\Omega _0 \cap \{Z \,{\in }\, S_{\textrm{u}}\}\) is a subset of \(\{\lim _{\alpha } \sigma (Z_\alpha )\,{=}\, \sigma (Z)\}\) and of \(\{\lim _{\alpha } \tau (Z_\alpha )\,{=}\, \tau (Z)\}\). Again by completeness this gives (3). Since \(S^* \subseteq S\) and \(S^* \subseteq S'\) as well as \(\sigma (Z_\alpha ) \leqslant \xi (Z_\alpha ) \leqslant \tau (Z_\alpha )\) the first part of (4) follows from (1) and (2). Finally, the second part of (4) follows from (3) in combination with the sandwich theorem.\(\square \)

Remark 3.10

If \((Z_\alpha )_{\alpha \in \mathbb {N}}\) in Theorem 3.9 in fact is a sequence, then we can drop the completeness assumption about the underlying probability space \((\Omega ,\mathscr {A},\mathbb {P})\). The reason for this is Corollary 3.3, which guarantees that the sets \(\Omega _t, \Omega _0\) and \(\Omega _1\) are elements of \(\mathscr {A}\).

The Argmin theorem for convergence in probability takes the form as described below. The reader recognises that it is only formulated for sequences and not more generally for nets. This is because we carry out the proof via the subsequence criterion. To the best of our knowledge, there is no counterpart for nets.

Theorem 3.11

Let \(Z_n\), \(n \in \mathbb {N}\), be convex stochastic processes in U and let \(Z \in S^*\) have an almost surely unique minimizer. Suppose \(Z_n(t) {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathbb {P}}} Z(t)\) for each \(t \in D\), where D is a countable and dense subset of \(\mathbb {R}\).

  1. (1)

    If \(U=S\), then \(\sigma (Z_n) {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathbb {P}}} \sigma (Z){\mathop {=}\limits ^\mathrm{a.s.}}\tau (Z)\).

  2. (2)

    If \(U=S'\!\), then \(\tau (Z_n) {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathbb {P}}} \tau (Z){\mathop {=}\limits ^\mathrm{a.s.}}\sigma (Z)\).

  3. (3)

    If \(U=S^*\!\), then \(\xi (Z_n) {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathbb {P}}} \sigma (Z){\mathop {=}\limits ^\mathrm{a.s.}}\tau (Z)\).

Proof

We will use the subsequence criterion, confer, e.g., Lemma 5.2 in Kallenberg [16]. So, let \((n_{0k})_{k \in \mathbb {N}}\) be a subsequence of \(\mathbb {N}\). As a countable set D has the form \(D=\{t_1,t_2,\ldots \}\). From \(Z_n(t_1) {\mathop {\rightarrow }\limits ^{\scriptscriptstyle \mathbb {P}}} Z(t_1)\) it follows with the subsequence criterion that there exists a subsequence \((n_{1i})_{ i \in \mathbb {N}}\) of \((n_{0k})_{k \in \mathbb {N}}\) such that on a set, \(\Omega _1\) say, with probability one we have that \(Z_{n_{1i}}(t_1) \rightarrow Z(t_1)\), \(i \rightarrow \infty \). Another application of the subsequence criterion ensures that there exists a subsequence \((n_{2i})_{i \in \mathbb {N}}\) of \((n_{1i})_{ i \in \mathbb {N}}\) such \(Z_{n_{2i}}(t_2) \rightarrow Z(t_2)\), \(i \rightarrow \infty \), on a set, \(\Omega _2\) say, which has probability one. Continuing in this way we find for every \(j \geqslant 1\) a subsequence \((n_{ji})_{i \in \mathbb {N}}\) of \((n_{j-1,i})_{i \in \mathbb {N}}\) and a set \(\Omega _j\) with \(\mathbb {P}(\Omega _j)=1\) such that \(Z_{n_{ji}}(t_j) \rightarrow Z(t_j)\), \(i \rightarrow \infty \), on \(\Omega _j\). Set \(\Omega _0:=\bigcap _{j \geqslant 1} \Omega _j\). Observe that \(\mathbb {P}(\Omega _0)=1\). The “diagonal” \((n_{ii})_{i \in \mathbb {N}}\) is a subsequence of the given sequence \((n_{0k})_{k \in \mathbb {N}}\). Consider an arbitrary index \(j \geqslant 1\). Apart from the first \(j-1\) terms the sequence \((n_{ii})_{i \in \mathbb {N}}\) is a subsequence of \((n_{ji})_{i \in \mathbb {N}}\). But on \(\Omega _0\), along that sequence we have convergence at point \(t_j\), whence in particular \(Z_{n_{ii}}(t_j) \rightarrow Z(t_j)\), \(i \rightarrow \infty \). Thus we arrive at \(Z_{n_{ii}}(t) \rightarrow Z(t)\), \(i \rightarrow \infty \), for all \(t \in D\) a.s. Especially, the sequence \((Z_{n_{ii}})_{i \in \mathbb {N}}\) fulfils the requirements of Theorem 3.9, which yields that \(\sigma (Z_{n_{ii}}) \rightarrow \sigma (Z)\) a.s. or that \(\tau (Z_{n_{ii}}) \rightarrow \tau (Z)\) a.s., respectively, and another application of the subsequence criterion gives (1) and (2). The assertion (3) follows from Theorem 3.9 (4).\(\square \)

3.1 Concluding remarks

Argmin theorems for convex processes (and distributional convergence) are known since Davis et al. [3], Hjort and Pollard [14], Geyer [12] or Knight [18]. Here the processes can even be defined on \(\mathbb {R}^d\) and not merely on \(\mathbb {R}\) as in this paper. Apart from that, however, there are two notable differences. First, the publications mentioned above only consider sequences, whereas we more generally allow for nets of processes. Secondly, a further main difference is that we also provide results when the limit process does not have a unique minimizing point. Incidentally, both points also apply to the Argmin theorems for almost sure convergence. The idea behind this is to understand semi-continuity of \(\sigma \) or \(\tau \) as continuity with respect to the order topologies in place of the natural topology on \(\mathbb {R}\). After that the continuous mapping theorem does the rest. Another way of looking at our approach is this: As long as a unique minimizer of the limit process exists, it is a natural candidate for the limit variable. If this is not the case, we do not search for new candidates, but simply make the topology on \(\mathbb {R}\) smaller. This also differs from Ferger’s [8] innovative approach, which retains the natural topology but more generally allows Choquet-capacities to play the part of limit “distributions”. The applicability of the Argmin theorem for convex processes lies in the fact that here, in contrast to such processes in larger function spaces, the only prerequisite is that of convergence of the finite dimensional distributions, confer Ferger [10] for an application in M-estimation. For example let \(C^*=\{f:\mathbb {R}\rightarrow \mathbb {R}; f \text { is continuous}\} \supset C\) be endowed with the topology \(\mathscr {T}_{\textrm{uc}}^*\) of uniform convergence on compacta. Here, even if you only consider sequences, one needs not only a functional limit theorem \(Z_n {\mathop {{\rightarrow }}\limits ^{\scriptscriptstyle \mathscr {D}}} Z\) in \((C^*\!,\mathscr {T}_{\textrm{uc}}^*)\), but also stochastic boundedness of the \(\xi (Z_n)\):

$$\begin{aligned} \lim _{d \rightarrow \infty } \limsup _{n \rightarrow \infty } \mathbb {P}\hspace{0.55542pt}(|\xi (Z_n)|\,{>}\,d)=0, \end{aligned}$$
(3.9)

see Ibragimov and Has’minski [15], van der Vaart and Wellner [24] or Ferger [9]. For the proof of the functional limit theorem alone, besides convergence of the fidis also tightness of the sequence \((Z_n)\) is required, which is usually done through maximal inequalities. Not to forget the proof of (3.9), usually by upper estimates for the tail probabilities. This means that the programme that has to be worked through is much more extensive and demanding than in the convex case.

There are two answers to the question why this is so. Firstly, by Proposition 3.6 it is already true under the sole assumption that the fidis converge, that then even a functional limit theorem applies. Secondly, there is no counterpart of Corollary 2.6, which says that \(\sigma \) is continuous on \(S_{\textrm{u}}\). For example, let us consider \((C^*\!,\mathscr {T}_{\textrm{uc}}^*)\) and let \(\sigma ^*(f)\) be the smallest minimizing point of f (existence assumed). Then one can construct a sequence \((f_n)\) such that \(f_n\) converges to f uniformly on every compact \(K \subseteq \mathbb {R}\), i.e., \(f_n \rightarrow f\) in \((C^*\!,\mathscr {T}_{\textrm{uc}}^*)\), but \(\sigma ^*(f_n) \rightarrow -\infty \). In particular, \(\sigma ^*\) is far from being continuous on \(S_{\textrm{u}}\).