Evaluations of expectations of order statistics and spacings based on IFR distributions

We consider i.i.d. random variables X 1 , . . . , X n with a distribution function F preceding the exponential distribution function V in the convex transform order which means that F has an increasing failure rate. We determine sharp upper bounds on the expectations of order statistics and spacings based on X 1 , . . . , X n , expressed in the population standard deviation units. We also specify the distributions for which all these bounds are attained. Finally, we indicate some reliability applications.


Introduction
Consider an i.i.d. sample X 1 , . . . , X n based on the common cumulative distribution function F with the finite expectation Let X 1:n ≤ · · · ≤ X n:n denote the order statistics based on X 1 , . . . , X n . The density function of the ith order statistic from the standard uniform i.i.d. sample of size n is defined by denote the Bernstein polynomials of order n. Then is the respective distribution function. Order statistics and their linear combinations (shortly called L-statistics), especially spacings, play an important role in mathematical statistics and other fields of applied probability. In the present paper we consider the problem of establishing sharp bounds on the expected order statistics and spacings coming from the restricted class of distributions which is defined with use of the convex transform order of van Zwet (1964). We say that F precedes W in the convex transform order (F C W ) if the composition F −1 W is concave on the support of W (equivalently, W F −1 is convex on the support of F). We assume here that W is a fixed absolutely continuous distribution function with a positive density w on an interval [0, d) for some 0 < d ≤ +∞. If W is either uniform or exponential distribution function, every F C W has increasing density (F ∼ ID) or increasing failure rate (F ∼ IFR), respectively.
A lot of papers devoted to bounds on the expectations of order statistics and L-statistics in various nonparametric models have been published so far. The most classical results presenting the bounds on the sample range, maximum and other order statistics and their differences based on sequences of independent observations with arbitrary, identical distributions, expressed in terms of the standard deviation units, were determined by Plackett (1947), Gumbel (1954), Hartley and David (1954) and Moriguti (1953), respectively. Rychlik (1998) and Goroncy (2009) established the optimal positive and nonpositive upper bounds on the expectations of L-statistics coming from arbitrary distributions, respectively.
Sharper evaluations of order statistics and spacings from the classes of distributions with decreasing density and failure rate functions were presented by Gajek and Rychlik (1998), Danielak (2003) and Danielak and Rychlik (2004). They were extended by Rychlik (2002), and Danielak and Rychlik (2003) to more general families of distributions with monotone density and failure rate functions on the average, which are generated by the star ordering. Recently, Rychlik (2014) described the precise upper bounds on the expectations of extreme order statistics based on the ID and IFR distributions. Goroncy and Rychlik (2015) provided general tools for obtaining sharp upper bounds on the expectations of single order statistics and spacings expressed in terms of the population mean and standard deviation, for the families of all parent distributions preceding various W in the convex transform order. They also characterized the distributions which attain the bounds, and specified the general results for the distributions with increasing density functions.
We aim at completing these results with the analogous upper bounds for the distributions with the increasing failure rates, i.e. for E X k:n −μ σ , 1 ≤ j ≤ n, and E X j+1:n −X j:n σ , 1 ≤ j ≤ n − 1, with parent distribution functions F from the IFR family. A general method of establishing positive sharp upper bounds on the expectations of properly normalized linear combinations of order statistics E n i=1 c i X i:n −μ σ , for arbitrarily fixed c = (c 1 , . . . , c n ) ∈ R n , and many other statistical functionals based on restricted nonparametric families of distributions was presented in Gajek and Rychlik (1996). In our setup, it is based on the following sequence of relations where P c W h denotes the projection of a function h ∈ L 2 ((0, d), w(x)dx) onto the convex cone is nondecreasing and concave}, (1.2) and || · || is the norm of L 2 ([0, d), w(x)dx). If the norm in (1.1) is positive, then the bound is attained by the distribution function satisfying (1.3) Goroncy and Rychlik (2015) described the projections on (1.2) of some particular functions h satisfying the conditions which we list below.
The assumptions are similar to those presented in Danielak and Rychlik (2004). Let us recall some auxiliary functions, which are necessary for determining the projection of h satisfying (A) onto (1.2). We begin with ( 1.4) It is easy to check that if T vanishes at some β, then g(x) = h(β) is the optimal constant approximation of h restricted to the interval (β, d) in the norm of L 2 ((β, d), w(x)dx). Moreover, there exists a unique a < β * < c such that T (β * ) = 0. Further we take which is the slope of optimal L 2 -approximation of h (0,α) by linear functions λ(x − α) + h(α) with the fixed right-end point h(α). We also put If Y (α) ≥ 0 for some b < α < c, then the function arising by gluing λ * (α)(x − α) + h(α) left to α and h(x) on the right is concave in a neighborhood of α. If Z (α) = 0, then λ * (α)(x − α) + h(α) is the projection of h (0,α) onto the subspace of all linear functions in L 2 ((0, α), w(x)dx). It occurs that the projection of h onto the convex cone (1.2) is either linear increasing, then equal to h, and finally constant (written further l-h-c for brevity), or first linear and ultimately constant (l-c, respectively). This is precisely described in the following proposition (cf. Goroncy and Rychlik 2015, Proposition 1).
Proposition 1 Assume that the zero a < β * < c of (1.4) belongs to (b, c) is the projection of h on (1.2). Otherwise we define Then Z is nonempty, and P c W h(x) = P α * h(x) for unique α * = arg max α∈Z ||P α h|| 2 .
The original version of the proposition contained only the necessary condition (1.9) for parameter α determining the projection of the l-c shape. Here we complete the statement precisely indicating parameter α * of the l-c projection. Set Z is nonempty by assumption, because the l-c projection is the only option if l-h-c is excluded. Then the breaking point α * > β * of the broken line projection has to satisfy (1.9). If α ∈ Z, then the linear increasing part of P α h is the orthogonal projection of h (0,α) onto the linear subspace of linear functions in L 2 ((0, α), w(x)dx), i.e.
Similarly, the constant part is the orthogonal projection of h (α,d) onto the subspace of constant functions in This implies that for every α ∈ Z, we obtain and in consequence ||h|| 2 = ||P α h|| 2 + ||h − P α h|| 2 .
Since the norm of h is fixed, the function P α * h minimizing the distance to h is just the one with maximal norm. The projection is unique, and so is α * . It occurs that in the particular problems we consider below, there is only one α satisfying (1.9), and there is no need for comparing norms of different P α h.

Increasing failure rate distributions
In this paper we consider distribution functions F which precede the standard exponential distribution function V (x) = 1 − e −x , 0 < x < d = ∞ in the convex transform ordering. Note that F C V means that the hazard function F (x) = V −1 F(x) = − ln[1− F(x)] is convex. In consequence, its derivative called the failure rate function 1−F(x) (which exists almost everywhere and has both onesided derivatives at each point as well as the respective density function f = F ) is nondecreasing. Therefore every distribution function F C V is said to have increasing failure rate (F ∼ IFR, for short). Distribution functions with monotone failure rates are of vital interest in various branches of lifetime analysis. In order to calculate sharp mean-variance upper bounds on the expectations of centered L-statistics is nondecreasing and concave}. (2.1) For arbitrarily fixed c 1 , . . . , c n , the functions are possibly multimodal polynomials of degree n of the argument e −x . There are not known universal methods of projecting such functions onto (2.1). We focus here on the single order statistics and spacings for which the respective projected functions satisfy Assumptions (A). They are also the most popular L-statistics useful in the lifetime analysis, because they represent consecutive failure times of items examined in lifetime experiments, and the time distances between them.

Single order statistics
For an i.i.d. sample X 1 , . . . , X n with common marginal F ∼ IFR, we aim at establishing accurate upper bounds for E The extreme order statistics with j = 1, 2, and n were already treated by Rychlik (2014). Our auxiliary problem is to project the following function and d = +∞. Here the first interval of decrease of (2.2) is empty, which is acceptable. By Proposition 1, the projection of (2.2) onto (2.1) can be either of l-h-c or l-c type.
In Proposition 2 below, we present the bounds corresponding with the first case. To this end we specify the general functions (1.4)-(1.7) for particular h = h j:n defined in (2.2). Using auxiliary formulas we determine functions on the positive half-axis.
Proposition 2 Suppose that T j:n (b j:n ) < 0 so that the unique zero 0 < β j:n < c j:n of (2.7) belongs to (b j:n , c j:n ). Also, suppose that Y j:n = {b j:n < α < β j:n : Y j:n ≥ 0 and Z j:n = 0} is nonempty. Let α j:n denote the smallest (possibly unique) element of Y j:n , and λ j:n = λ j:n (α j:n ). Then The equality in (2.11) holds for the distribution function represented by the formula (2.12) uniquely determined up to the location and scale parameters μ and σ , respectively, with modified argument x → y = x−μ σ B j:n + 1, introduced for brevity. Proof The crucial step of our reasoning consists in showing that the assumptions guarantee that the projection of h j: x ≥ β j:n . (2.13) The tools are collected in Proposition 1. Function (2.7) satisfies T j:n (0) < 0, T j:n (c j:n ) > 0, and increases in between (see Goroncy and Rychlik 2015, p. 180). The first necessary condition for (2.13) is that the unique zero β j:n of (2.7) belongs to the interval (b j:n , c j:n ) of concave increase of h j:n . This obviously holds iff T j:n (b j:n ) < 0. Point β j:n is the only admissible candidate for the change of the l-h-c type projection from h j:n itself to the constant.
The other condition is that Y j:n is nonempty, i.e. the interval [b j:n , β j:n ) contains points α satisfying Y j:n (α) ≥ 0 and Z j:n (α) = 0. The latter relation together with T j:n (β j:n ) = 0 imply that the weighted integral of the l-h-c function glued at α and β * j:n coincides with that of the original function h j:n , which is a necessary condition for the projection (see, Rychlik 2001a, Lemma 1). Condition Y j:n (α) ≥ 0 implies that gluing λ j:n (α)(x − α) + h j:n (α) with h j:n (x) at α ∈ (b j:n , β j:n ) produces a concave function in a vicinity of α. If there were more points in Y j:n , the projection is constructed with use of the smallest one. Since all the necessary conditions are deduced from the assumptions of Proposition 2, the projection is actually equal to (2.13).
Due to (1.1), an upper bound on the expectation of standardized jth order statistic coincides with the norm of projection P c V h j:n . Since P c V h j:n ≡ 0, the bound is sharp. We present here slightly simpler analytic form of the norm based on the identity , and performing some tedious calculations, we arrive to the explicit form (2.12) of the parent IFR distribution function attaining the bound.
Up to linear transformations of the argument, the extreme distribution (2.12) is composed of three parts: the exponential one on the left, the inverse of some increasing part of the density function f j:n , and the jump of height e −β j:n at the right-end point. We performed a number of numerical verifications of the assumptions of Proposition 2 for moderate n and all 3 ≤ j ≤ n − 1. It occurs that Proposition 2 provides the bound for the case j = 3, n = 4 only. The precise value of the bound and description of the IFR distribution attaining it is presented in Example 1.

Example 1 The sharp bound
is attained by the distribution function The exponential part on the left have probability 0.44089. The jump on the right has value 0.53753. The contribution of the inverse cubic function between them amounts to 0.02160 only.
Our conjecture is that Example 1 is the only application of Proposition 2. For large n, the inflection point b j:n lies close to the maximal argument c j:n , and in consequence h j:n (b j:n ) is only slightly less than the maximum h j:n (c j:n ). However, by definition cannot be too large, because h j:n (x) for large arguments x is essentially less than h j:n (c j:n ). This implies that β j:n < b j:n , which violates condition T j:n (b j:n ) < 0. Even for small n, when the relation holds, there is not enough space in the interval (b j:n , β j:n ) for any points α satisfying Z j:n (α) = 0 together with Y j:n (α) ≥ 0.
It seems that with the exception of the case presented in Example 1, the bounds on the expectations of order statistics from IFR populations are determined with use of the l-c-type projection. These are presented in Proposition 3 below. However, one should be aware of the fact that for given j and n, first the assumptions of Proposition 2 should be verified and the l-h-c projection excluded before one uses the formulas of Proposition 3. If the assumptions for the l-h-c shape of the projection do not hold, we define Proposition 3 Suppose that either T j:n (β j:n ) ≥ 0 or Y j:n = ∅ for some fixed 3 ≤ j ≤ n −1 ≥ 3. Then set Z j:n = {α ≥ β j:n : A j:n (α) = 0, γ j:n (α) > 0} is nonempty, and EX j:n − μ σ ≤ B j:n = B j:n (α j:n ), (2.18) where α j:n = arg max α∈Z j:n B 2 j:n (α). The equality in (2.18) holds for the distribution function for y = y(x) = x−μ σ λ j:n B j:n − γ j:n λ j:n + α j:n with γ j:n = γ j:n (α j:n ) and λ j:n = λ j:n (α j:n ).
There is a unique α j:n ∈ Z j:n minimizing ||P α h j:n || 2 , and this defines the unique non-zero projection P c V h j:n . By (1.1), the sharp upper mean-variance bound for the expectation of jth order statistic is ||P α j:n h j:n || = B j:n = α 2 j:n + 1 λ 2 j:n − λ j:n + γ j:n 2 1/2 . The distribution function attaining the bound is characterized by the relation which determines (2.19).
The bounds in (2.18) are attained by the right truncated and linearly transformed exponential random variables [cf. (2.19)], with the jumps of sizes e −α j:n on the right.
Precisely, X i d = λ j:n σ B j:n min{Y i − α j:n , 0} + γ j:n λ j:n + μ for Y i , i = 1, . . . , n, being standard exponential. The transformation is defined in the complicated way just to fulfil the first two moment conditions. The convex order transform is invariant under the location and scale modifications. This means that every exponential distribution truncated on the right at the level 1 − e −α j:n attains the bound for the expected jth order statistic standardized with respective mean and standard deviation.
Numerical studies show that each set Z j:n contains only one element, and this is used in construction of the projection and calculation of the bound. We cannot prove it formally, though. We were able to do it for the analogous hypothesis in analysis of increasing density distributions. In that case, the counterparts of (2.7)-(2.10) and (2.14)-(2.16) were expressed by linear combinations of Bernstein polynomials. Then we could apply the respective variation diminishing property of Schoenberg (1959) for evaluating the numbers of their zeros and extremes. The property was also useful in analysis of the DFR case, when the Bernstein polynomials of transformed argument α → 1 − e −α were studied. The method is not applicable here, because we consider functions composed of polynomials of argument 1 − e −α combined with small powers of α itself. Accordingly, specifying particular functions h(x) = f j:n (1 − e −x ) − 1, does not allow us to obtain results stronger than those concluded directly from Proposition 1, being stated for general h satisfying (A).
In Table 1 we present numerical values of bounds B j:n on the standardized expectations of order statistics E(X j:n − μ)/σ, 3 ≤ j ≤ n − 1, 4 ≤ n ≤ 10, for the increasing failure rate populations. Each bound is accompanied by the value 1 − exp(−α j:n ) which represents the contribution of the expectation part in the distribution attaining the bound. Parameter α j:n uniquely determines the distribution. It does not appear for j = 3, n = 4, because the extreme distribution has a more complicated form, precisely described in Example 1. Comparing the obtained bounds with the respective ones for the ID distributions (see Goroncy and Rychlik 2015 , Table 1), we note that the bounds in the IFR case are slightly greater. The relations are not surprising, since the ID family of distributions is contained in the IFR family. We can also observe that

Spacings
We now establish sharp upper bounds on the standardized expectations of spacings , and finally decrease to zero at d = +∞. In the increase intervals (ã j:n ,c j:n ), they are first convex and then concave. The tangency points cannot be written down explicitly. These are uniqueb j:n ∈ (ã j:n ,c j:n ) solving the equations The bounds and justifications of their sharpness are presented in Propositions 4 and 5. Their statements are similar to those of Propositions 2 and 3, respectively. Their proofs are almost identical with those of their counterparts, and therefore we omit them. The only differences consist in using modifications of functions (2.7)-(2.10) and (2.14)-(2.17). Noting the identityh j:n = h j+1:n − h j:n and linearity of operators (1.4)-(1.7) acting on functions h, we definẽ (2.21) Proposition 4 Suppose thatT j:n (b j:n ) < 0 so that the unique zeroã j:n <β j:n <c j:n of (2.21) belongs to (b j:n ,c j:n ). Also, suppose thatỸ j:n = {b j:n < α <β j:n :Ỹ j:n ≥ 0,Z j:n = 0} is nonempty. Letα j:n denote the smallest (possibly unique) element of Y j:n ,f j:n = f j+1:n − f j:n , andλ j:n =λ j:n (α j:n ). Then E X j+1:n − X j:n σ ≤B j:n , wherẽ B 2 j:n =f 2 j:n 1 − e −α j:n 1 − e −α j:n +2λ j:nf j:n 1 − e −α j:n 1 − e −α j:n −α j:n +λ 2 j:n α 2 j:n − 2α j:n + 2 − 2e −α j:n +f 2 j:n 1 − e −β j:n e −β j:n uniquely determined up to the location and scale parameters μ and σ , respectively, with modified argument x → y = x−μ σB j:n .
We suspect that the assumptions of Proposition 4 never hold. We verified the claim for small n. For large n, the increase parts ofh j:n are very steep, and only minor highlylocated fragments are concave. It is very unlikely that their smaller pieces become parts of projections. Now we focus on the extreme spacings with j = 1 and n − 1. In the first case, we recall the results of Goroncy and Rychlik (2015, Proposition 6). The bounds are valid for arbitrary parent distributions. They are attained by x−μ σB 1:n < −n, f −1 1:n x−μ σB 1:n , −n ≤ x−μ σB 1:n < n(n−2) n−2 (n−1) n−1 , 1, x−μ σB 1:n ≥ n(n−2) n−2 (n−1) n−1 . (2.23) They have increasing density functions on their interval support, and atoms with masses n−2 n−1 at the right-end points. This implies that they are IFR as well. Accordingly, the general upper bounds (2.22) are sharp for the IFR distributions. For n = 2, the bound in (2.22) reduces to 2 √ 3 3 , and (2.23) has uniform density. This is a special case of range evaluations, due to Plackett (1947).
We finally proceed to the last spacings with j = n − 1. At the first step, we project the functions onto (2.1). Starting from the origin, they decrease to the global minimum atã n−1:n = ln n 2 , then convexly increase to the tangency point atb n−1:n = ln Proposition 6 Under Assumptions (Ã), with notation (1.5)-(1.7), the setỸ = {α > b : Y (α) ≥ 0, Z (α) = 0} is nonempty, and for α * = inf{α ∈ Y} yields Outline of proof Following Rychlik (2014, Proposition 3.2), we can show that the projection belongs to the family The only difference between our assumption and those of Rychlik (2014) are that function h in the latter case does not have the decreasing part. The arguments in both the cases are identical, though. Especially, they are based on the fact that every concave nondecreasing function crosses at most two times the strictly convex increasing part of h, and so does if the convex increasing part is preceded by a strictly decreasing one. We further note that for fixed α ≥ b is a quadratic convex function of argument λ. It is globally minimized at λ * (α) defined in (1.5), and the constrained minimal slope is max{λ * (α), h (α)}. Now we claim that Y (α) > 0 for all sufficiently large α. Indeed, the linear functions for all x > 0 if α is sufficiently large. However, this does not hold for λ * (α)(x − α) + h(α), because this is the best approximation of h (0,α) by functions λ(x − α) + h(α), λ ∈ R. This implies that λ * (α)(x − α) + h(α) and h(x) cross each other in (0, α), and so λ * (α) > h (α), as required.
We conclude the section with numerical evaluations of boundsB( j, n) for the spacings in small samples, presented in Table 2 below. They are presented together with the probabilities 1−exp(−α j:n ) of exponential parts of the distributions attaining the bounds. The rests of the probability masses exp(−α j:n ) are concentrated at the atoms located at the right ends of the supports. For fixed n, smaller bounds can be observed for central spacings, and greater ones hold for extreme ones. Contributions of the exponential density functions in the extreme distributions increase as do so the spacing ranks.

Possible further developments
(the coefficient at c n+1 vanishes in the latter formula). Variation diminishing property (VDP, for short) of Bernstein polynomials (see, e.g., Rychlik 2001b, Lemma 14) can be used in establishing the numbers of increase/decrease and convexity/concavity intervals and verifying the assumptions. It asserts that the number of sign changes of any linear combination of Bernstein polynomials in (0, 1) does not exceed the number of sign changes in the vector of combination coefficients. Moreover, the initial and ultimate signs of the combination coincide with the signs of the first and last non-zero coefficients, respectively.  Apparent applications are provided by the reliability theory. If a system is composed of n elements with i.i.d. lifetimes X 1 , . . . X n (exchangeability is sufficient here), then the distribution function of the system lifetime T is a convex combination of order statistics distribution functions P(T ≤ t) = n i=1 s i P(X i:n ≤ t), where the combination coefficient vector s = (s 1 , . . . , s n ), called the Samaniego signature depends merely on the system structure. Therefore ET = E n i=1 s i X i:n , and our methods can be applied for precise evaluations of ET −EX 1 √ Var X 1 , when X i have an IFR distribution.
For the overwhelming majority of the coherent systems, the signature vector is either monotone or unimodal, i.e. first nondecreasing and then nonincreasing. E.g., Navarro and Rubio (2010) showed that there is only one system with a bimodal signature among 180 systems of size 5. Due to (3.1) and VDP, the monotonicity properties are inherited by the respective functions h s . The conclusion is not immediately apparent in analysis of the second derivatives: then the modifications 1 − 1 2n−2i−3 c i+3 −2c i+2 + 1 + 1 2n−2i−3 c i+1 instead of the standard second differences c i+3 −2c i+2 +c i+1 , i = 0, . . . , n − 2, should be studied.
By VDP, h s 1 is first convex increasing, then concave increasing, concave decreasing, and finally convex decreasing. It satisfies the other requirements of (A) with the exponential weight as well. Using the projection of h s 1 onto (2.1), we determine the bound ET The difference from the respective sharp bound without the IFR restriction is Var X 1 ≤ 0.304111 is almost unnoticeable. We see that except for Example 1, the l-h-c type projections may be useful in description of bounds for various systems. Finally, we note that EX 3:5 −EX 1 √ Var X 1 ≤ 0.37576 in the IFR case.
Example 3 Consider the parallel connection of three single components and the series of two items, which lifetime is given by T 2 = max(X 1 , X 2 , X 3 , min(X 4 , X 5 )).