Sharp bounds on distribution functions and expectations of mixtures of ordered families of distributions

We compare two mixtures of arbitrary collection of stochastically ordered distribution functions with respect to ﬁxed mixing distributions. Under the assumption that the ﬁrst mixture distribution is known, we establish optimal lower and upper bounds on the latter mixture distribution function and present single families of ordered distributions which attain the bounds uniformly at all real arguments. Furthermore, we determine sharp upper and lower bounds on the differences between the expectations of the mixtures expressed in various scale units. General results are illustrated by several examples.


Introduction
Let {F θ } θ∈R be an arbitrary family of stochastically ordered distribution functions, i.e., ones that satisfy (1.1) We assume that the family is not known. We further introduce two distribution functions S and T which are assumed to be known. We do not impose any restrictions on them. They may have discrete and/or continuous components, and supports concentrated on possibly partially or even fully different subsets of the real line. The purpose of the paper is to compare the mixture distribution functions and their expectations. Our model is motivated by nonparametric Bayes and regression ideas. The relation between the predictor θ and random response with distribution function F θ is completely unknown except for a frequently natural and practically justified premise that the greater value of predictor results in a (stochastically) greater response. We investigate consequences of various choices of prior S and T for the wide nonparametric response model restricted merely by the above order constraints. Classic Bayes procedures are focused on identifying the predictor value managing a single random experiment. Our approach is global: we analyze the final consequences of random choice of predictor, and random response to the selected predictor. This corresponds to the situation of multiple repetitions of the experiment where various values of predictors are chosen according to the prior selection rule.
Precisely, in Sect. 2 we determine sharp lower and upper bounds on distribution functions (1.3) under the constraint that condition (1.2) is satisfied for an arbitrarily fixed G. The lower and upper bounds constitute proper distribution functions iff the mixing distribution function T does not have a positive mass right and left, respectively, to the support of S. We also show that the bounds are attained uniformly for any G, i.e., there exist single families {F θ } θ∈R and {F θ } θ∈R such that (1.2) holds, and the lower and upper bounds on (1.3) are attained by {F θ } θ∈R and {F θ } θ∈R , respectively, for every real argument x. In Sect. 3, we determine the greatest possible lower and upper deviations of EY = R x H(dx) from EX = R xG(dx) (cf (1.2) and (1.3)), measured in various scale units (E|X − EX | p ) 1/ p = R |x − EX | p G(dx) 1/ p , p ≥ 1, based on central absolute moments of (1.2). We illustrate theoretical results by a number of examples in Sect. 4. Section 5 is devoted to the proof of Theorem 1 of Sect. 2 that is the basic result of the paper providing the tools for establishing the expectation bounds of Sect. 3. Mixture models have multiple applications in probability and statistics. It was shown in a review paper of Karlis and Xekalaki (2003) that they are exploited in data modeling, discriminant and cluster analysis, outlier and robustness studies, analysis of variance, random effects and related models, factor analysis, and latent structure models, random variable generation, and approximation of distributions. Here we merely mention several nonparametric Bayesian inference and regression analysis applications. One of the pioneering works on nonparametric Bayesian estimation methods was that of Ferguson (1973) who provided Bayes estimates of the response distribution function and several its functionals under the Dirichlet process prior. Ghosh and Mukherjee (2005) presented a sequential version of the distribution function estimate. Susarla and Ryzin (1978), Tiwari and Zalkikar (1990), Gasparini (1996), and Zhou (2004) determined nonparametric Bayes estimators of distribution function under various censoring schemes. Sethuraman and Hollander (2009) estimated the nominal lifetime distribution of an unit exposed to various repair treatments. Hansen and Lauritzen (2002) proposed a method of Bayesian estimation of arbitrary concave distribution function when the prior is a proper mixture of Dirichlet processes. For the Bayesian density estimation, we refer to Escobar and West (1995) and Vieira et al. (2015), whereas the hazard rate function was treated by McKeague and Tighiouart (2002).
From the vast nonparametric Bayesian regression literature, we mention only the following. Choi (2008) studied convergence of posterior distributions when the response-predictor relation was modeled by mixtures of parametric densities, and a sieve prior was assumed. Chung and Dunson (2009) estimated the conditional response distribution and identified significant predictors under the assumption of probit stickbreaking process as a prior. Zhu and Dunson (2013) used nested Gaussian processes as locally adaptive priors for the regression model. Jo et al. (2016) considered quantile regression problems with the Dirichlet process mixture modeling the error distribution.
A small proportion of nonparametric Bayes research is devoted to order restricted inference. Assuming restricted dependent Dirichlet prior for a collection of partially ordered distributions, Dunson and Peddada (2008) tested equalities in the homogenous groups and estimated group-specific distributions. Yang et al. (2011) developed posterior computations for stochastically ordered latent class model based on the nested Dirichlet process prior. Wright (2007, 2008) performed Bayesian multiple comparisons for ordered medians and means, respectively. The model of stochastically ordered mixtures presented here was earlier examined by Miziuła (2017) who established tight lower and upper bounds on the ratios of various dispersion measures of mixtures. Some similarities to our approach can be also found in Robertson and Wright (1982) where lower and upper bounds on mixtures of stochastically ordered Chi-squared distributions useful in testing homogeneity of normal means against their ordering were established.

Distribution functions of mixtures
The main result of this section is following.
Theorem 1 Let S, T , and G be fixed distribution functions on R. Then for every family {F θ } θ∈R of stochastically ordered distribution functions (cf (1.1)) satisfying (1.2), the following inequalities hold  Their actual image sets can be smaller, though. We explain when it occurs with use of a natural notion of ordering intervals which we further often use in the proof of Theorem 1. We say that interval 1 precedes interval 2 , and write 1 ≺ 2 iff for every θ 1 ∈ 1 and θ 2 ∈ 2 we have θ 1 < θ 2 . We admit that the intervals have or do not have their left and right end-points. Accordingly, H • G is a distribution function iff (2.4) (for simplicity of notation, we use the same symbol for the distribution function and respective probability distribution). Clearly, H • G is not a proper distribution function iff is the supremum point of H • G(R). It means that H • G is not a distribution function when T has a positive probability mass right to the support of S. This may even happen when the support of T is contained in that of S, but they have a joint right support end-point, at which T has an atom, and S has not. Formally we admit the situation that u(S, T ) = 0 if the whole mass of T is located right to the mass of S, and then H • G simply vanishes. Analogously, H • G is a proper distribution function iff It is clear that u ≥ u.

Remark 4 Observe that the inverses
and are the greatest convex minorant and the smallest concave majorant, respectively, of the set U(T , S) which are exploited in minimizing and maximizing mixture distribution function R F θ (x)S(dθ) under restriction on fixed form of R F θ (x)T (dθ).
The proofs of the lower and upper bounds, and their attainability are rather long, but similar. Therefore in the proof presented in Sect. 5, we restrict ourselves to the lower bound case. Families {F θ } θ∈R and {F θ } θ∈R which attain the bounds in (2.1) strongly depend on the particular shape of set (2.2), and resulting forms of the greatest convex minorant and the smallest concave majorant. It is impossible to describe them in a concise form for general S and T (dependence on G is much simpler here) in the statement of Theorem 1. Precise construction of {F θ } θ∈R is presented in the proof of Theorem 1. Construction of {F θ } θ∈R is analogous to that of {F θ } θ∈R , and it not presented there.

Expectations of mixtures
In this section we examine variations of the expectations of mixtures under stochastic ordering of mixed distributions. Suppose that we consider a mixture of arbitrary unknown family of ordered distribution functions, and the actual mixing distribution is S. The resulting random variable X has an unknown distribution function (1.2). However, we assume that the mixing distribution is T , different from S. This generates a random variable Y with different distribution function (1.3) and expectation EY . Our purpose is to evaluate the maximal possible differences between the assumed and actual expectations of mixtures EY − EX . This is measured in various scale units σ p = (E|X − EX | p ) 1/ p , p ≥ 1, generated by the central absolute moments of the actual mixture variable X . The bounds depend merely on the mixing distributions S and T , and on the parameter p of the measurement units σ p . They are valid for all possible {F θ } θ∈R , and resulting mixture distributions G and H . The only restriction is that X has a finite pth raw moment E|X | p of chosen order p ≥ 1, and Y has a finite expectation. Similarly, under condition that actual mixture variable X has a bounded support, we determine upper and lower bounds on EY − EX gauged in the scale units First we exclude the possibilities for which we obtain trivial infinite bounds on EY − EX , independently of the choice of the scale units. Assume first that S and T are such that respective H ([0, 1]) = [0, u] [0, 1], i.e., condition (2.5) holds. Then clearly one can find the partition 1 , 2 of R such that (2.5) is satisfied with T ( 2 ) = 1 − u. Take the family of distributions where F U (x) = x, 0 ≤ x ≤ 1, and F P (x) = 1− 1 x , x ≥ 1, are the standard uniform and Pareto distributions with shape parameter 1, respectively. Then obviously G = F U , H = (1 − u)F U + u F P , and so EX = 1 2 , 1 4 ≤ σ p ≤ 1 2 for all 1 ≤ p ≤ +∞ whereas EY = +∞. Similarly, contradicting (2.7), we are able to choose an ordered family such that its mixtures with respect to S and T have bounded support, and expectation −∞, respectively. Therefore assumptions (2.4) and (2.7) are necessary for getting nontrivial finite upper and lower bounds on EY − EX .
Other extremities may occur when S and T are stochastically ordered. If S(θ ) ≥ T (θ ), θ ∈ R, then all the points of (2.2) lie either above or on the diagonal {(u, u) : 0 ≤ u ≤ 1} of the unit square. If we additionally assume (2.4), we obtain H (u) = u, 0 ≤ u ≤ 1. By Theorem 1, we get H (x) ≥ G(x), x ∈ R, and EY ≤ EX for an arbitrary choice of ordered {F θ } θ∈R for which the expectations exist. Similarly, assumptions S ≤ T and (2.7) imply EY ≥ EX for any {F θ } θ∈R . Determination of the nonpositive upper bounds and nonnegative lower ones for (EY − EX )/σ p , 1 ≤ p ≤ +∞, when S ≥ T and S ≤ T , respectively, needs more subtle tools than these used below, and will be treated elsewhere. Now we present strictly positive upper bounds and strictly negative lower bounds on (EY − EX )/σ p for 1 < p < ∞, p = 1 and p = +∞ in Propositions 1, 2, and 3, respectively. Due to its importance and relative simplicity, case p = 2 is especially distinguished in Corollary 1. The bounds depend on the mixing distributions S and T , and the scale unit parameter p, and all of them are optimal. We describe conditions of their attainability by specifying mixture distribution G = G(S, T , p) (see (1.2)). Construction of the families of mixing distributions {F θ } θ∈R and {F θ } θ∈R providing stochastically minimal and maximal mixture distribution H (and, in consequence, maximal and minimal expectations EY , respectively) for given S, T , and G = G(S, T , p) are described in the proof of Theorem 1.

Proposition 1 Let X and Y be random variables with distribution functions which
are the mixtures of some stochastically ordered families of distributions {F θ } θ∈R with respect to fixed distribution functions S and T , respectively. Assume that X has a finite pth moment for some fixed 1 < p < ∞.

(i) Suppose that S T satisfy (2.4), and denote by h the right-continuous version of the derivative of the greatest convex minorant of set
The equality in (3.1) is attained if (1.2) has the right-continuous quantile function satisfying where μ ∈ R and σ p p ∈ R + denote arbitrarily chosen values of expectation and pth absolute central moment of G, respectively.
(ii) Let S T satisfy (2.7), and h denote the right derivative of the smallest concave majorant of (2.2).
and μ and σ p p denoting arbitrarily chosen expectation and pth absolute central moments of G.
Proof We prove statement (i). Note first that by definition h is well-defined nonnegative, non-decreasing and right continuous function. Assumption 1 0 h p/( p−1) (u) du < ∞ implies that the left hand-side of (3.2) is a continuous strictly decreasing function of c ranging from 1 0 h 1/( p−1) (u) du at 0 to 0 as c h(1) even if it is infinite. Similarly, the right-hand side of (3.2) continuously increases from 0 to 1 0 h 1/( p−1) (u) du as c varies over the same range.
We can write whereas owing to Theorem 1 Since we also have μ = μH (1) = 1 0 μh(u)du, the following yields which, due to (3.5), gives (3.1). We obtain equality in the above Hölder inequality iff h(u proportional with a nonnegative proportionality factor which is fulfilled in (3.3). By (3.2), the right-hand side of (3.3) has zero Lebesgue integral over [0, 1] which guarantees that the expectation of G is equal to μ. Moreover, which means that the pth absolute central moment of G amounts to σ p p , as desired. Similar proof of claim (ii) is left to the reader.

Remark 5 Assumption
is necessary for finiteness of the upper bound in Proposition 1 even if (2.4) holds. Indeed, infiniteness of the integral implies that H is not linear an some left neighborhood (1 − ε, 1) of 1, and its graph coincides with a piece of set (2.2) there. This means that where { u } 1−ε<u<1 is a family of ordered intervals so that 1 − ε < u 1 < u 2 < 1 implies u 1 ≺ u 2 , and 1−ε<u<1 u constitutes a single right-unbounded interval. From the proof of Theorem 1 we conclude that the family {F θ } θ∈R attaining uniformly the lower bound in (2.1) consists of degenerate distributions for all θ ∈ 1−ε<u<1 u . If we take 1 − ε < u 1 < 1, and replace all the degenerate distribution functions F θ , θ ∈ u 1 <u<1 u , by a single distribution function where θ u 1 is any element of u 1 , we do not disturb ordering of the family, and the shape of H on [0, u 1 ], and modify the latter into the linear function 1−H (u 1 ) 1−u 1 (u −u 1 )+ H (u 1 ) on (u 1 , 1]. The resulting distribution function H 1 preserves convexity, and its right derivative h 1 is bounded. There exists a unique C p,1 being the solution to (3.2) with h replaced by h 1 . For finite where X and Y have distribution functions G 1 and H 1 • G 1 , respectively, arising by mixing ordered families of distribution functions with respect to S and T . We repeat the procedure for consecutive elements of increasing sequence (u k ) ∞ k=1 tending to 1. Then bounded h k tend increasingly to h, and for arbitrary c, and for all C p,k in particular. This implies that the difference EY − EX measured in (E|X − EX | p ) 1/ p units does not have a finite bound.
Analogously, we verify that under conditions (2.5) and the respective lower bound amounts to − ∞.
(i) If (2.4), S T , and and equality is attained iff (ii) Under assumptions (2.7), S T , and and equality is attained iff An essential simplification in case p = 2 follows from the fact that under the above assumptions we get and the same equality holds for h.
Proposition 2 Let X , Y , and h, h be defined as in Proposition 1, and suppose that X has a finite mean.
where h (1) is the left derivative of H at 1. The equality is attained in the limit by as ε → 0, and μ and σ 1 are arbitrarily chosen common values of the expectation and mean absolute deviation from the mean of all mixture distributions G ε . (ii) Under (2.7), S T , and boundedness of h, we have with h(1) denoting the left derivative of H at 1. The bound is attained in the limit by the family of mixture distributions {G ε } 0<ε<1/2 described in Proposition 2(i) as ε → 0.
The difference between the attainability conditions in Proposition 2(i) and (ii) consists

Proof of Proposition 2
We focus on the proof of statement (i), because that of (ii) is similar, and we omit it here. We start with the relations Function h and its left continuous versions are monotone and continuous at 0 and 1, respectively. Accordingly, they tend to h(0) and h(1) uniformly over intervals [0, ε] and [1 − ε, 1] as ε → 0. Therefore the right-hand side of (3.8) tends to Obviously, the families of distributions described in Proposition 2 are not the only ones which attain the bounds in the limit. One can modify them in many ways without disturbing desired asymptotic properties. If h(u) = h(0) and h(u) = h(1) on some non-degenerate intervals [0, u 0 ] and [u 1 , 1], say, then bound (3.6) is attained nonasymptotically by a single family of ordered distributions. The necessary and sufficient conditions of attainability are then G This happens when H is linear on some vicinities of 0 and 1. Accordingly, the condition for non-limit attainability of lower bound (3.7) is linearity of H about 0 and 1. It is worth pointing out that such behavior of H and H is not especially extraordinary, and occurs for a significant proportion of pairs S and T . Observe that under condition (2.4), and when (2.7) holds. Miziuła and Solnický (2018, Theorem 1) (see Miziuła 2015, Section 2.3, for the proof completely different from the above) used these notions for presenting the inequalities when both (2.4) and (2.7) are assumed. In fact, the min, max, and zeros can be dropped, because h(1) − h(0) ≤ 0 ≤ h(1) − h(0) due to (3.9) and (3.10).
Proposition 3 Take X , Y , and h, h defined above and suppose that X has a bounded support.
(i) If (2.4) holds, S T , and with arbitrary μ ∈ R and σ ∞ ∈ R + serving as the mean and maximal absolute deviation from the mean, respectively, of distribution function G. (ii) Assume (2.7), S T , and Let λ − and λ + be defined as in (3.12) and (3.13), respectively, except for h replaced by h.
with the above meaning of μ and σ ∞ .
Proof (i) We easily obtain (3.11), because The former inequality follows from Theorem 1, and is valid and attainable for any G under appropriate choice of mixed distributions. The latter becomes the 2 . If λ − = λ + = 1 2 , and, in consequence, h(u) = h 1 2 is either empty or a degenerate interval, the optimal G is the unique two-point symmetric distribution function supported on μ±σ ∞ . Otherwise (3.14) is the only assumption necessary for fulfilling the first moment condition (ii) The proof is omitted due to its similarity to the above.
If λ − + λ + < 1, the necessary condition (3.14) is in particular satisfied by two-or three-point distribution which assures that actually ess sup |X − EX | = σ ∞ . An analogous construction is possible in Proposition 3(ii).
Also, one can verify that dropping the assumptions on boundedness and integrability of h (and h) in Propositions 2 and 3 implies that the respective upper (and lower) bounds are infinite. We finally point out that strengthening the assumptions on moments of X with mixture distribution G in Propositions 1-3 allows to relax respective conditions on integrability of powers of functions h and h, and, in consequence, on moments of Y with distribution functions H • G and H • G, respectively.

Examples
We first mention applications of our mixtures in the reliability theory. If a system is composed of n identical elements with arbitrary exchangeable joint lifetime distribution, then the single component lifetime is the discrete uniform mixture of n stochastically ordered component lifetime order statistics distributions, whereas the system lifetime is another convex combination of theses distributions whose probability vector, called the Samaniego signature, depends on the structure of the system (see Samaniego 1985;Navarro et al. 2008). These representations were used by Navarro and Rychlik (2007) for evaluating the distribution functions and expectations of system lifetimes by means of distribution functions and moments of component lifetimes. A similar idea can be used for calculating bounds on expectations of linear combinations of order statistics based on identically distributed samples (cf Rychlik 1993).
In the examples below, we restrict ourselves to calculating the sharp upper and lower bounds expressed with scale units with parameters p = 1, 2 and ∞, because for the other values of p we merely obtain numerical approximations. We immediately notice that the greatest convex minorant and the smallest concave majorant of the points are piecewise linear functions with respective stepwise right-continuous derivatives Applying Corollary 1, we calculate Conclusions of Propositions 2 and 3 are following here After changing the roles of S and T , we get The bounds shrink down towards 0 as p increases. This is evident due to the fact that by the Hölder inequality the scale units (E|X − EX | p ) 1/ p increase in p. Besides, the classes of distributions G with finite pth moments decrease with increase of p. The only exception make here the lower bounds on EY − EX which are still equal to − 1 5 . This is caused by the fact that then the bounds are attained for two-point symmetric distributions G for which (E|X − EX | p ) 1/ p are equal for all 1 ≤ p ≤ + ∞.
Example 2 Suppose that the actual mixing distribution function is the power one S(θ ) = 1−(1−θ) α , 0 < θ < 1, for some α > 0, whereas it is assumed that the mixing distribution function is another member of the power family T which implies that v itself is convex if α > β, and concave when α < β. Accordingly, the greatest convex minorant and the smallest concave majorant of (2.2) have the forms H (u) = 1−(1−u) β/α and H (u) = u, and their derivatives are h(u) = β α (1−u) β/α−1 and h(u) = 1, when α > β, and the functions change their roles if α < β. By Theorem 1, every mixture distribution H of stochastically ordered family {F θ } θ∈R with respect to T satisfies the inequalities when α > β and the mixture of {F θ } θ∈R with respect to S is G, and the inequalities are reversed for α < β.
Below we discuss the expectation bounds for the case α > β in greater detail, and merely mention the final results for α < β which are derived analogously. By Proposition 1, the upper bound on EY −EX (E|X −EX | p ) 1/ p is finite for given 1 < p < ∞ when 1 0 (1 − u) β α −1 p p−1 du < ∞, i.e., when the exponent is greater than − 1, which is equivalent to p > α β . However, the bound does not have an analytic form except for the case p = 2. The lower one is 0 for all 1 < p < ∞ (and for p = 1, ∞ as well). When α < β, the upper bound vanishes, and the lower one is negative finite for all 1 ≤ p ≤ ∞.
For α ≥ 2β, the right-hand side expression is replaced by + ∞. Similarly, we obtain for α > β, but in the case α < β the respective lower bound is nontrivial, and we have For p = ∞, we have H 1 2 = 1 − 2 −β/α , and so for α > β. We easily check that the inequalities are reversed for α < β. However, we do not claim that the zero bounds are optimal here. We finally observe that all the nonzero bounds tend to 0 as β → α.
Example 3 Suppose that one mixing distribution function is exponential with expectation 2, i.e., S(θ ) = 1 − exp(−θ/2), θ > 0, and the alternative one is that of the sum of two independent standard exponentials, that is represents set (2.2) for the above choice S and T . It is easily verified that v ranges over the whole standard unit interval, is strictly increasing, and convex on (0, 1 − e −1 ), and concave on (1 − e −1 , 1). The greatest convex minorant H of v is first coinciding with some convex part of v, and then linear. The breaking point is determined by the equation and amounts to 1 − e −1/2 ≈ 0.39347 which satisfies v (1 − e −1/2 ) = 2e −1/2 ≈ 1.21306. Therefore the greatest convex minorant and its derivative are following respectively. The smallest concave majorant H of v is first linear, and then identical with v. The change point satisfies This does not have an analytic representation, and equals to u 0 ≈ 0.87242 so that v (u 0 ) ≈ 1.05075. The smallest concave majorant Note that both functions h and h are bounded here, and hence all the bounds of Propositions 1-3 are finite. For establishing the standard deviation bounds with p = 2, we use the following indefinite integral The upper and lower bounds are respectively. In the case p = 1, we immediately check that e −1/2 and − 1 2 v (u 0 ) are the upper and lower optimal evaluations. Since H 1 2 = 1 − e −1/2 and H 1 2 = 1 2 v (u 0 ), bounds (3.11) and (3.15) of Proposition 3 take on the forms 2e −1/2 and 1 − v (u 0 ), respectively. Below we present numerical approximations of the above bounds

Example 4
We finally consider a pair of seemingly similar symmetric unimodal mixing distributions supported on [0, 1]. One is which has a symmetric triangular density, and an atom 1 4 at the mode 1 2 . The other one has quadratic density of different forms on 0, 1 2 and 1 2 , 1 , and a sharp peak at 1 2 . Then The first pairs represent the graph of convex increasing function, and that of the latter is concave increasing. The greatest convex minorant of U(S, T ) has three different parts: function v(u) = 8 3 2 3 u 3/2 on some right neighborhood of 0, then a straight line joining the graph of v with point 5 8 , 1 2 , and ultimately another piece of line passing between 5 8 , 1  2 and (1, 1). The point of change from strictly convex v into the first linear one is determined by the slope equality condition 8 which can be rewritten as The cubic function of real argument replacing √ u has three real roots − 2 , and only the middle one is located in (0, 1). Therefore our change point is and its derivative is Since H (u) = 1− H (1−u), and h(u) = h(1−u), we easily deduce that b p (S, T ) = −B p (S, T ), 1 ≤ p ≤ + ∞, which means that the sharp lower bounds are the negatives of their upper counterparts.
For fixed 0 < u ≤ 1, we solve an auxiliary minimization problem for discrete approximations of F θ (x), θ ∈ R. Consider a finite partition of real axis R = n i=1 i , where 1 ≺ · · · ≺ n are disjoint ordered intervals. Each interval either has or does not have the left and right end-point. Define s i = S( i ), t i = T ( i ), and denote by u i unknown constant values of F θ (x) on i , i = 1, . . . , n. The only restrictions on them are 1 ≥ u 1 ≥ · · · ≥ u n ≥ 0, and R F θ (x)S(dθ) = n i=1 s i u i = u. Our auxiliary problem is to minimize R F θ (x)T (dθ) = n i=1 t i u i under the constraints on u 1 , . . . , u n . By the change of variables . . , n, are known and satisfy 0 ≤ S 1 ≤ · · · ≤ S n = 1 and 0 ≤ T 1 ≤ · · · ≤ T n = 1.
Our problem can be rewritten as to minimize n i=1 T i v i over the intersection of the simplex and hyperplane The simplex is a convex polyhedron with vertices 0, being the zero vector, and e i , i = 1, . . . , n, which are the standard unit vectors in R n . Clearly, the line segments joining the extreme points constitute the edges of S n . The intersection S n ∩ H S is a convex set as well, and all its extreme points belong to some edges of S n . All the edges can be represented as In the latter one, If either u = S i or u = S j , then either e i ∈ H S or e j ∈ H S , respectively. These come under the provisions of the previous case. When S i < u < S j , the respective vector has the coordinates The linear functional n i=1 T i v i attains its minimal value on the convex compact set S n ∩ H S at some its extreme points. This means that it suffices to confine analysis to the finite set of points (5.1) if u ≤ S i , and (5.2) if S i < u < S j . Coming back to the original variables we conclude that the possible candidates for the minimum points are (u 1 , . . . , u n ) with and Adding i = 0 with S 0 = 0, we can jointly represent (5.3) and (5.4) by the formula Note that for i = 0 and j = n, the first and last options in (5.5) are empty, respectively. Accordingly, for given u = G(x) and F θ (x), θ ∈ R, constant on the intervals of finite partition 1 ≺ · · · ≺ n there exist 0 ≤ i < j ≤ n such that S i = S i k=1 k < u ≤ S j = S j k=1 k which provide the minimal value of This means that in this case it suffices to partition the parameter set R onto at most three intervals 1 ≺ 2 ≺ 3 such that S( 1 ) < u ≤ S( 1 ∪ 2 ). Observe that 1 and/or 3 may be empty so that S( 1 ) = 0 and/or S( 1 ∪ 2 ) = 1. For given S, T , and 0 < u ≤ 1, we define where the infimum is taken over all partitions 1 ≺ 2 ≺ 3 of the real line satisfying for some θ 1 < θ 2 , and similar relations hold for T ( 1 ) and T ( 2 ). This means that for given 0 < u ≤ 1, H (u) is the infimum over the values of all linear functions joining points (S(θ 1 ∓), T (θ 1 ∓)) and (S(θ 2 ∓), T (θ 2 ∓)) for which S(θ 1 ∓) < u ≤ S(θ 2 ∓) at argument u. This defines the greatest convex minorant of the set of points {(S(θ −), T (θ −)), (S(θ ), T (θ )) : θ ∈ R} on interval [0, 1] when we let vary u over the interval. Obviously, the greatest convex minorant does not change if we drop the left-limit values.
A trivial but important observation is that the same partition 1 ≺ 2 ≺ 3 may provide the infimum in (5.6) for various u. In particular, if the infimum is attained when S( 2 ) = 1, then H (u) = u u with u defined in (2.6) for every u ∈ [0, 1]. Another extremity is that H (u) is approached as S( 2 ) 0, and the linear functions in (5.6) tend to the tangent of U(S, T ) at u. In this case, one needs to perform separate approximations for single values of u. We prove that (5.6) satisfies the left hand-side inequality in (2.1). Let F θ (x) for fixed x be a non-increasing function of θ ∈ R taking values in [0, 1] and satisfying R F θ (x)S(dθ) = u. If F θ (x), θ ∈ R, take on finitely many values (u 1 , . . . , u n ), we proved above that there exists F θ (x), θ ∈ R, with no more than three values 0, 0 < c < 1 and 1, satisfying the assumptions and such that and the infimum over these at most three-valued functions is just H (u).
If F θ (x), θ ∈ R, has an infinite image set, we need more subtle arguments. Since it is monotone, nonnegative and bounded above, we can write are non-increasing, have finite numbers of values in [0, 1], and integrate to u with respect to measure S. Therefore which is our claim. We proved that for every fixed x ∈ R, there exists function F θ (x), θ ∈ R, with at most three values in [0, 1] which satisfies (1.2) and provides the minimal value of (1.3) which amounts to (2.3). Letting x vary over R, we obtain two-variable function F : R 2 → [0, 1] which can be treated as a family of functions {F θ } θ∈R in variable x. Our aim is to prove that the construction defines a family of stochastically ordered distribution functions.
To this end, we first perform some auxiliary considerations. Define This notation means that either (u, H (u)) = (S(θ ), T (θ )) or (u, H (u)) = (S(θ −), T (θ −)). This is a closed subset of [0, 1] which can be represented as an at most countable sum of disjoint closed, possibly degenerate intervals A = j A j . Then its interior is a sum of disjoint open intervals A = int A = j A j . Note that the number of open intervals can be less than the number of the closed ones. The border of A consists of at most countably many separate points We finally introduce which means that neither (u, H (u)) = (S(θ ), T (θ )) nor (u, H (u)) = (S(θ −), T (θ −)). Obviously, this is a sum of no more than countably many disjoint intervals B = j B j . For every G(x) = u ∈ A there exists θ = θ u ∈ R such that (u, H (u)) = (S(θ ), T (θ )). This is not uniquely determined, though. We denote the set of all θ satisfying the condition by u . We also define A j = u∈A j u . Similarly, for . This is unique in the latter case, but otherwise may be many θ 's satisfying the condition. We denote the set of all θ sharing the property by c j . Finally, for B j = B j \{c k , c l } with c k < c l , define Then we have a partition of the parameter set For any two elements D k ≺ D l of the partition (5.7) we have D k ≺ D l . Moreover, inequality u < v for some u, v ∈ A j implies u ≺ v as well.
From the solution of the local minimization problems for various u = G(x), we conclude the following minimization conditions. If u ∈ A j = (c k , c l ), say, then (5.8) The above notation, introduced for brevity, means that θ ≺ ( ) u when {θ } ≺( ) u , and θ ( ) u when either {θ } ≺ ( ) u or θ ∈ u . For u ∈ B j = [c k , c l ], say, we have four possible cases. We take if (c k , H (c k )) = (S(θ c k ), T (θ c k )) and (c l , H (c l )) = (S(θ c l ), T (θ c l )) for θ c k and θ c l , being some representatives of c k and c l , respectively. Moreover, where H (G(x)) = T (θ u ) for (5.8), and H (G(x)) = T (θ c k ±) + T (θ c l ±)−T (θ c k ±) S(θ c l ±)−(θ c k ±) [u − S(θ c k ±)] with properly chosen limits at c k and c l in the other cases. Now we rewrite the above formulae fixing θ ∈ R, and letting x vary. Assume first (A1) θ ∈ u ⊂ A j for some u = G(x) ∈ A j ⊂ A.
Then F θ (x) = 1 by (5.8). Take u > v = G(y) ∈ A (we admit v ∈ A j as well as v ∈ A k ≺ A j ). Since for some B j ≺ A j . Then u c l and any of (5.9)-(5.12) implies that F θ (y) = 0 for all θ ∈ u . Similar arguments show that F θ (y) = 1 if G(y) > G(x) and either G(y) ∈ A or G(y) ∈ B. This shows that for every θ ∈ G(x) such that G(x) ∈ A, we have Consider now the set of parameters θ such that (A2) c k ≺ θ c l for some (c k , c l ) = B j , with (c k , H (c k )) = (S(θ c k ), T (θ c k )), and (c l , H (c l )) = (S(θ c l ), T (θ c l )).
Then for every x such that G(x) = u ∈ [c k , c l ] we have F θ (x) = G(x) − S(θ c k ) S(θ c l ) − S(θ c k ) .
If G(x) ∈ B i = [c p , c q ] ≺ B j , we obtain c q ≺ c k ≺ θ , and each of (5.9)-(5.12) implies F θ (x) = 0. If G(x) ∈ B i \ {c k } = [c p , c k ), relation c k ≺ θ provides the same conclusion by (5.9) and (5.12). When G(x) ∈ B i = [c p , c q ] with c p ≥ c l , we have two possibilities. When c p > c l , we apply the relation c l ≺ c p , possibly together with θ ≺ c l , to obtain F θ (x) = 1. Suppose now that G(x) ∈ (c l , c q ]. Applying formulae (5.9) and (5.11), we conclude again F θ (x) = 1 from condition θ c l . Accordingly, for θ 's satisfying assumption (A2) we have (5.14) which is a proper distribution function on R.
Using much the same arguments, we deduce that for parameters θ satisfying the conditions  respectively. Assumptions (A1)-(A5) guarantee that the family of distribution functions described in (5.13)-(5.17) is defined for all θ ∈ R. For verifying that this is stochastically ordered it suffices to refer to formulae (5.8)-(5.12). They show that for F θ (x) is decreasing in θ every fixed x ∈ R. The proof of the right-hand side inequality in (2.1) as well as its optimality is analogous to the above, and therefore it is omitted.