Monotone Function Problems and Statistical Selection Procedures

Monotone function problems are introduced on a very elementary level to reveal the close connection to certain statistical problems. Equations F(x)=c\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F(x) = c$$\end{document} and inequalities F(x)≥c\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F(x) \ge c$$\end{document} with monotone increasing functions F are considered. Solution methods are stated. In the following, it is shown how some important problems of statistics, especially also statistical selection problems, can be solved by transformation to monotone function problems.


Monotone Function Problems
Some statistical methods are based on very simple mathematical principles. Also for practical statisticians, it is worth learning something about the mathematical background of these methods. Here, we consider the monotony principle which is essential for cumulative distribution functions. Applications are given from Sect. 3 on.
We start with some notations. ℝ is the set of real numbers, ℝ + the set of nonnegative real numbers and ℕ the set of nonnegative integers.
Monotone functions F(x) have a lot of simple and remarkable properties. Here, we concentrate on the solution of equations und inequalities determined by monotone increasing functions. Some elementary statements follow where I ⊆ ℝ is an interval and F ∶ I ↦ ℝ is a function with range R(F) ⊆ ℝ . Here, I can be a closed interval is the support of h. There is a stochastic interpretation. Namely, if K is considered as a function K x (z) of z with fixed parameters x, then the values are the expectations E K x (z) of the functions K x (z) depending on the continuous random variable z with the probability density function h(z).
Example 2 Let X = x 1 , x 2 be a two-dimensional continuous random vector where x 1 and x 2 are independently distributed. Further, let f x 1 (x) , f x 2 (x) be the corresponding probability density functions (p.d.f.) and F x 1 (x) , F x 2 (x) be the corresponding cumulative distribution functions (c.d.f.). In the special case X = x 1 + x 2 , it is known that which is of form (1) and corresponds to the convolution of F x 1 and f x 2 . Correspondingly, the difference X = x 1 − x 2 has the c.d.f. This representation is again of form (1). Finally, for the quotient X = we get which also fits into form (1). If you will find other representations in the literature, you will reach at our formulas by simple substitutions.

Convention
In the following, we often omit the infinite limits −∞ and +∞ in integrals of form (1).

Proposition 3 It is supposed that
(a) h is integrable, nonnegative and continuous on the support C(h), for all x, y ∈ D with x < y and all z ∈ C(h).
Then the function F in (1) is monotone increasing, continuous, nonnegative and bounded.
Proof Obviously F (x) in (1) is defined and continuous for all x ∈ D considering the assumptions (a) and (b).
Since the integrand in (1) is nonnegative on D × ℝ observing (a) and (c), the integral values F(x) are also nonnegative. Besides, we find by (c) for all x, and therefore, F(x) is bounded. Finally, choosing x < y and taking (c), (d) and the monotony of the integral into account, we get Therefore, F(x) is monotone increasing.

Remark 1
It is easy to see that the limit F(+∞) ∶= lim x→+∞ F(x) exists and can be calculated by interchanging limit operation and integration in (1). By the way, we have

Remark 2 If F(x) is strictly monotone increasing, then F(d) < F(+∞) and F(+∞)
is not attained by F(x) (at any x). This is, for example, the case if we modify (d) in Proposition 3 to (d′) K(x, z) < K(y, z) for all x, y with x < y and for all z. (1). Beside the assumptions of Proposition 3 it is supposed that Then the function F in (1) is a c.d.f. of a continuous random variable.
Proof The assumptions of Preposition 3 guarantee that limit operation and integration can be interchanged. Consequently, we get because of (c′).
This shows that all properties of a c.d.f. are fulfilled under the given assumptions.

Remark 3 If K(x, z)
is partially differentiable w.r.t. x and the partial derivative is also continuously differentiable. The derivative is obtained by Consequently, the gradients for x = d and x = +∞ are If K x (x, z) satisfies analogue conditions as K(x, z) in Proposition 3, then the limit K x (+∞, z) exists and we get

Stochastical Applications
Now we discuss further applications of the integral representation (1) in stochastics.

Example 3
We consider the function F ∶ ℝ → ℝ + defined by where F 1 is a c.d.f. and f 2 is a p.d.f. of possibly different continuous distributions. We have in representation (1) We can check the assumptions of Proposition 3. For f 2 and F 1 (x + z), conditions (a) and (b) are true. Because of 0 ≤ F 1 (x + z) ≤ 1 for all x, z, condition (c) is satisfied. Obviously F 1 (x + z) fulfils condition (d), since F 1 is monotone increasing. Therefore, the results of Proposition 3 hold. Besides, condition (c′) in Proposition 3a is true.
By the way, the value describes the effect of the first to the second distribution. Finally, F is differentiable. The derivative is the p.d.f.

Example 4
We introduce the function F ∶ ℝ → ℝ + given by where F 0 is the c.d.f. and f 0 is the p.d.f. of a certain continuous distribution. We can put to get the representation (1). Again the assumptions of Proposition 3 are fulfilled. In particular, we have the estimation

Further, it is
It is easy to show that which means especially J 0,m = 1 m+1 . This holds independently of the concrete c.d.f. F 0 . In [10: p. 61-62] the proof is given for the case that F 0 is the c.d.f. of the standard normal distribution. But the idea is the same as for general F 0 . Because of Further, we get for l ≥ 1 the derivative

This implies
Since the integrand of (3) consists of three factors, we can also split it into the factors A simple investigation shows that F 1 = F l 0 is again a c.d.f. of a certain distribution and f 0 is a positive multiple of a p.d.f. of another distribution f 2 , say where the real functions F 1 , F 2 with F � 2 = f 2 and v occur. Here, the kernel in (1) has the form K(x, z) = F 1 (z + v(x)) with a generally nonlinear shift function v in the argument of F 1 . If F should be differentiable, then also v has to be differentiable. The wished properties for F have consequences for v . The next result supplies a stochastical statement, where v has additional properties.
Hence, a coercive function is unbounded.

of a continuous random variable.
Proof We can proceed similar to Example 3 which is the special case with v(x) = x (see (2)). Namely, because of the coercivity and monotony of v it follows If v is continuously differentiable, we get for the derivative

Statistical Selection Procedures
We refer to [6: chapter 11] and consider a set of a populations lim x→±∞ v(x) = ±∞.
which correspond to continuous random variables u i with c.d.f. F i ordered by certain scores q i (i = 1, … , a) where We want randomly choose the t best populations according to these scores (1 ≤ t < a) , i.e. the first class subset with high probability c ∶= 1 − of correct selection ( 0 < < 1 ). The a − t remaining populations constitute the second class subset We follow here the indifference zone formulation assuming a gap between G 1 and G 2 such that for a separation parameter > 0 the condition holds. Now random samples u i,k (k = 1, … , n) are produced in the populations A i of constant size n which are ranked by (estimated) scores q u i,k . If G 1s is the set of the t best populations by random selection, we want to realize where CS means correct selection. An idea to exploit Proposition 1 or Proposition 2 is to find a monotone function F(x) = F(x;a, t, ) which fulfils P CS ≥ F(n) ≥ c for the common sample size x = n . A natural precondition is since otherwise each selection of t populations is correct. Whether the inequality (7) is fulfilled or not depends on n. The task of this indifference zone problem is to find minimal or at least appropriate values n such that (7) is satisfied. Then we can sample realizations G 1s of G 1s . In the meantime, there are also interesting studies for two-stage approaches [8] starting with the Gupta subset approach for pre-selection [3] and followed by the Bechhofer approach to reduce the effort of sampling.

An Analytical Approach to the Bechhofer Selection Problem
Bechhofer supposed in [1] populations A i ( i = 1, 2, … , a ) with normally distributed random characteristics u i ∼ N i , 2 of known constant variance 2 . The F i are the corresponding c.d.f. If we have drawn random samples of n observations for all u i , natural scores are the expectation values q i = i which can be estimated  N(0, 1) . The inequality F(n) ≥ c in (9) can be used to determine (minimal) n. This is the Bechhofer selection problem (BSP). It supplies interesting analytical surplus if we extend the Bechhofer function F ∶ ℕ ↦ ℝ to a real function F ∶ ℝ ↦ ℝ replacing n formally by the real variable x ≥ 0: Here a, t and r are parameters with the given meanings which are supposed to be fix in the following. When approximation by numerical procedures are necessary, we use instead of the improper integral in (10) the proper one with finite integral limits −N and N ( N > 0 natural number) and denote this cut function by F N (x) . The proofs of the following statements are given in [10], but can also be easily derived from the more general results in this paper. F and F N are (independently of the parameters a, t, r) (a) continuous and nonnegative, (b) strictly monotone increasing and invertible, (c) smooth for x > 0, where derivation can be done by partial derivation after x under the integral,

Solution Methods for the Bechhofer Selection Problem
Observing the results in Proposition 1, we can state a discrete and a continuous variant to solve BSP with the Bechhofer function (10 where is the risk of wrong selection. Proposition 7 shows that BSP can be solved by the counting-up algorithm mentioned in Sect. 1. The supplied n = n c is just the minimal sample size of BSP. Proposition 8 allows to use more sophisticated zero-problem solvers for F(x) − c = 0 . After determination of the zero x = x c , we round off to the next natural n = n c = ⌜x c ⌝ to get the minimal sample size.
Although the second algorithm will work often faster, especially for large values of n c , the time saving is rather minor. Numerical effects in the algorithms can lead to small errors. For example, we can reach at ñ c = n c + 1 in bad cases which is really not important in practical applications. To avoid this effect, the cut number N can be increased using F N and function calculations in the integrand as well as in the integration procedures can be realized with more accuracy.
The zero-problem solvers have a considerable advantage if the sample sizes n c = n c (a, t, r) are needed for several parameters r. Namely, considering the expression r √ x in (10) we get x c (a, t, kr) = 1 k 2 x c (a, t, r) for k > 0 which means also n c (a, t, kr) = ⌜ 1 k 2 x c (a, t, r)⌝ . Hence, we can restrict ourselves to r = 1 and generate the x c for r ≠ 1 by the formula given above.  Observe that the integral representation in Proposition 9 is a special case of (2) in Example 3.
The next step will include several independent random variables with the same c.d.f. They are independently and identically distributed (i.i.d.). Now we can combine the results of Proposition 9 and Proposition 10 to get a statement which allows to treat also selection problems with non-normal distributions.

Remark 5
Observe that F is monotone and continuous, but need not to be strictly monotone (see the distinction between Proposition 1 and Proposition 2 given in Sect. 1).

Remark 6
If both x i and y j are i.i.d. with the common c.d.f. F 0 , then Proposition 11 supplies for F 0 = F 1 = F 2 the c.d.f. for d. A similar representation already occurred in Example 4 (see (3)).

Indifference Zone Selection Problem for Non-normal Populations
We assume that the a statistical populations can be ranked by transformation according to a common distribution with c.d.f. F 0 and the t best are to be selected with risk of wrong decision. If for a certain coercive function v with v(0) = 0 satisfies P CS ≥ F(n) ≥ c ∶= 1 − ; then, the monotone function approach can be used for selection. If we specialize to normal distributions given in the Bechhofer approach which means F i are the c.d.f. of N i , 2 , we get F i (z) = F 0 (u) = Φ(u) and f i (z) = 1 ⋅ (u) with z = i + ⋅ u using standardization to N(0, 1) . Since the mean value scores of sample size n are distributed as N i , 2 n , the gap parameter d = is to be replaced by d * = √ n = r √ n . The function (11) will be transformed for l ∶= a − t and m = t to Hence, we get BSP and the Bechhofer function (10) replacing n by x. Here, we have v(n) ∶= r √ n. This method can be also applied if we have a populations whose characteristics are not normally distributed. We assume generally different expectations i = E x i , but constant known (or estimated) variance 2 = V x i with i = 1, … , a. It can be shown that f (x) = 1 b − a for x ∈ [a, b] and f (x) = 0 else.