Linearity of regression for overlapping order statistics

We consider a problem of characterization of continuous distributions for which linearity of regression of overlapping order statistics, E(Xi:m|Xj:n)=aXj:n+b\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}(X_{i:m}|X_{j:n})=aX_{j:n}+b$$\end{document}, m≤n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m\le n$$\end{document}, holds. Due to a new representation of conditional expectation E(Xi:m|Xj:n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}(X_{i:m}|X_{j:n})$$\end{document} in terms of conditional expectations E(Xl:n|Xj:n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}(X_{l:n}|X_{j:n})$$\end{document}, l=i,…,n-m+i\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l=i,\ldots ,n-m+i$$\end{document}, we are able to use the already known approach based on the Rao-Shanbhag version of the Cauchy integrated functional equation. However this is possible only if j≤i\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\le i$$\end{document} or j≥n-m+i\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\ge n-m+i$$\end{document}. In the remaining cases the problem essentially is still open.

where a, b are some real constants, and we want to describe the family of parent distribution for which (1) holds.
The problem has a long history. It goes back to Fisz (1958) who considered the case m = n = i = 2, j = 1, a = 1 and characterized the exponential distribution. This setting was extended in Rogers (1963) with characterization of the exponential distribution by (1) with m = n, i = j + 1, a = 1. The case of adjacent order statistics was completed in Ferguson (1967) who considered the case m = n, i = j + 1 with no restriction on a and characterized three families of distributions: exponential for a = 1, Pareto for a > 1 and power for 0 < a < 1. Similar result was obtained in the PhD thesis of Pudeg (1991) and independently in Ahsanullah and Wesołowski (1997) for (1) with m = n and i = j + 2. Other trials in the non-adjacent case where given in Dembińska and Wesołowski ((1997)) and López-Blázquez and Moreno-Rebollo (1997). Finally the problem for m = n was completely solved in Dembińska and Wesołowski (1998), denoted in the sequel by DW, where the same triplet of exponential, Pareto and power distributions or their symmetric (about zero) versions were characterized by (1) with arbitrary j < i or j > i, respectively. Various recent extensions and complements of this result can be found e.g. in , , Beg et al. (2013), Bieniek and Szynal (2003), Cramer et al. (2004), Ferguson (2002) or Gupta and Ahsanullah (2004).
All the previously mentioned papers were concerned with the case of one sample, i.e. m = n. We were able to trace in the literature only two papers dealing with the case m = n. In Ahsanullah and Nevzerov (1999) the authors claim that (1) with i = j = 1 and n > m characterizes the triplet of exponential, Pareto and power distributions as above. In Wesołowski and Gupta (2001) only a very special case i = m = 1 was considered-see Sect. 5 below for more details.
In the present paper we will give the characterization of both the triplet families (exponential, Pareto, power or their symmetric versions) in the case m ≤ n and j ≤ i or j ≥ n + m − i. Note that it does not cover the case considered in Wesołowski and Gupta (2001) but it covers the result announced in Ahsanullah and Nevzerov (1999). It appears that in the case considered, to prove the characterization one can apply Rao-Shanbhag version of integrated Cauchy functional equation (see Rao and Shanbhag 1994), similarly as in DW. This is done in Sect. 4. However, to reduce the problem to one to which this method can be applied we need to prove a representation of the conditional expectation E(X i:m |X j:n ) through conditional expectations from a single sample of size n. This is done, even in a more general setting, that is with no restrictions on relations between i and j, in Sect. 2. In Sect. 3 we observe that suitable form of linearity of regression (1) for m ≤ n holds for both considered triplets of distributions. In Sect. 5 we make some comments regarding the case i < j < n − m + i which still remains unsolved.

A representation of conditional expectation for overlapping order statistics
In this section we are interested in the conditional moment E(X i:m |X j:n ) for different values of i, j ∈ N, m < n ∈ N. We will express it as a convex combination of conditional moments of the form E(X l:n |X j:n ), l = i, i + 1, . . . , n − m + i. Theorem 1 Let X 1 , . . . , X n be a sequence of continuous, independent, identically distributed and integrable random variables. Then for any m < n ∈ N, (2) Proof Let us denote the set of all subsets of size m of {1, . . . , n} by C n m . Of course, # C n m = n m . We can number the elements of C n m arbitrarily and define C(k) as the i:m the i-th order statistic from (X i , i ∈ C(k)). Due to the fact that the joint distribution of (X 1 , . . . , X n ) is invariant under permutations, we can write: Let us consider the event A = {X 1 < X 2 < · · · < X n } and an arbitrary l ∈ {1, . . . , n}. Obviously, on the event A we have X l = X l:n . Note that if l ∈ {1, . . . , i − 1} ∪ {n − m + i + 1, . . . , n} then on A the variable X l cannot appear in the sum S i . Otherwise, on A the variable X l appears in the sum S i as many times as there are m-elementary combinations of elements of {1, . . . , n} which consist of: l, exactly (i − 1) numbers smaller than l and exactly (m − i) numbers greater than l. That is, Let S n denote the set of permutations of {1, . . . , n}. We may repeat the same reasoning for any event A σ = (X σ (1) < · · · < X σ (n) ), where σ ∈ S n . Consequently, (4) holds with A changed into A σ for any σ ∈ S n . Since the sets A σ , σ ∈ S n , are disjoint, we get Now (2) follows due to the identity σ ∈S n I A σ = 1 holding P-a.s.
Remark 1 Note that the coefficients which appear at the right hand side of (2) have a clear probabilistic interpretation. Namely, for any permutations of X 1 , . . . , X n for which X i:m = X l:n . Since every permutation of X 1 , . . . , X n is equally likely, we arrive at (6).

Linearity of regression for exponential, Pareto and power distributions
By PAR(θ ; μ; δ) we denote the Pareto distribution with the density where θ > 0, μ, δ are some real constants such that μ + δ > 0. By EXP(λ; γ ) we denote the exponential distribution with the density where λ > 0, γ are some real constants. By POW(θ ; μ; ν) we denote the power distribution with the density where θ > 0, −∞ < μ < ν < ∞ are some real constants. It is well known, see e.g. DW, that for each of the above distributions for l > j where α and β are some constants depending on the distribution and on l, j, n-the formulas for these constants are given on pp. 217-218 of DW. These formulas together with the representation, (2) imply for j < i that where a and b are suitable constants, which in each of special cases are listed below.
• For the exponential distribution EXP(λ; γ ) • For the Pareto distribution PAR(θ ; μ; δ) • For the power distribution POW(θ ; μ; ν) For any distribution μ of a random variable X , denote by μ − the distribution of −X . Since for Y i = −X i , i = 1, . . . , n, we have Y i:n = −X n−i+1:n it follows that (7) holds for l < j if the distribution of X i 's is one of the triplet PAR − , EXP − or POW − . Consequently, (8) holds for this triplet in the case j ∈ {n − m + i, . . . , n}.

Characterization in the case j ≤ i or j ≥ n − m + i
These three distributions of type μ or related of type μ − appear to be the only possible distributions for X i 's for which (8) holds with j ≤ i or, respectively, with j ≥ n−m+i.
Before we give the proof of our main result we recall a result on possible solutions of the integrated Cauchy functional equation. Following the method from DW we will use this result in the proof of the characterization. Let λ denote the Lebesgue measure on R + .
Theorem 2 ( Rao and Shanbhag (1994)) Consider the integral equation: where μ is a non-arithmetic σ -finite measure on R + and H : R + → R + is a Borel measurable, either non-decreasing or non-increasing λ-a.e. function that is locally λ-integrable and is not identically equal zero λ-a.e. Then there exists η ∈ R such that R + exp(ηx)μ(dx) = 1 and H has the form where α, β, γ are some constants. If c = 0, then γ = −α and β = 0. Now we are ready to state and then to prove our main result which is a characterization of both the triplets of distributions described in Sect. 3 by linearity of regression of order statistics from overlapping samples.
Proof Let us note that if X has a continuous distribution function F then in the case j < l the conditional distribution of X l:n given X j:n has the form Alternatively, for continuous F the conditional distribution X l:n |X j:n = x is the same as the distribution of Y l− j:n− j for the Y i , i = 1, . . . , n − j, which are iid and their common distribution function is F Y (y) = F(y)−F(x) 1−F(x) , y ≥ x and F Y (y) = 0, otherwise. This fact seems to be well known for continuous parent distribution (in particular, it was used in DW). Since in basic monographs by Arnold et al. (1992), David and Nagaraja (2003) it is stated only in the absolutely continuous case, while in Nevzerov (2001) it is formulated for continuous distributions but proved only in the absolutely continuous case, for the sake of completeness we sketch its proof here. We note that from the well known general formula for the distribution function of X k:n (see, e.g. (2.2.15) in Arnold et al. (1992), in the continuous case, since then F(X i ) has the uniform distribution on (0, 1), one gets for any k = 1, . . . , n. Therefore, to prove the formula (12) it suffices to check (which is an elementary computation) that with d F X l:n |X j:n =x (y) defined by (12) the following identity holds d F l:n (y) = y −∞ d F X l:n |X j:n =x (y) d F j:n (x) for any y ∈ R.
Let us first consider the case when j < i. From (2) and (12) we have: where Observe that there does not exist an interval (c, d), l F < c < d < r F , on which F is constant, because the right side of (13) is either strictly increasing or strictly decreasing. Both sides of this equation are continuous, so they could not be equal in the next point of increase of F. Therefore (l F , r F ) is the support of distribution given by F and F is strictly increasing on this interval. Both sides of the second equation in (13) are continuous with respect to x, so it holds for any x ∈ (l F , r F ).
F is strictly decreasing on (l F , r F )) into (13) and thus Note that the left hand side is strictly increasing in x and thus a has to be positive. Substituting again F(x) = w in (14), which implies x = F −1 (w), we get: Divide both sides of the above equation by a and substitute again t = e −u and w = e −v for v > 0 to arrive at After changing sum of integrals into integral of sums: where μ is a finite measure on R + , which is absolutely continuous with respect to the Lebesgue measure and has the form Note that H is strictly increasing on [0, ∞) as composition of two strictly decreasing functions. The assumptions of the Rao-Shanbhag theorem are satisfied, so H has the form where α, β, γ , δ, η are some constants and To find relations between η and a we rewrite (15) Performing the integration at the right hand side above (note that necessarily η < m − i + 1, otherwise the integrals are infinite) we get Finally, we get where Since the function h l is strictly increasing on (−∞, m − i + 1) it follows from (16) that for a given coefficient a there exists a unique η satisfying (15). Moreover, Let us now consider the case when j = i. From (12) we get thus instead of (14) we get Similarly, as in the case above we make substitutions and use the Rao-Shabhag theorem to arrive at the solution H . The only difference is the equation for a which now reads This equation gives us the same condition for parameter a as for the case j < i. Before computing the parameters of distributions we arrived at, we will explain why solution of the case j ≤ i gives also the solution in the case j ≥ n − m + i. Define Y k = −X k , k = 1, . . . , n and consider order statistics of the random vector (Y 1 , . . . , Y n ). Since Y k:n = −X n−k+1:n , so we can write for j ≥ n − m + i: Consequently, We will find distribution functions only in the case j < i (For i = j the derivation is almost exactly the same and is skipped. In the case j ≥ n − m + 1 one has again to refer to the representation Y k = −X k and use the results of the case j ≤ i). For η = 0 from the definition of H we get Hence for z > γ Consider now three cases: (1) a < 1 and η < 0 then (17) for z ∈ (μ, ν) can be written as where ν = α + γ , μ = γ , θ = − 1 η > 0. Notice that α has to be positive. Hence X 1 has POW(θ ; μ; ν) distribution and (a) θ = − 1 η , where η satisfies (16), (b) ν may be calculated from (11) with θ = − 1 η , (c) μ is a real number such that μ < ν.
(3) a = 1 and η = 0 then by the definition of H we get For i = 1, j = 2, m = 2, n = 4 under linearity of regression assumption we obtain the equation E(X 1:2 |X 2:4 = x) = 1 3 x + 1 2 x Similarly, for i = 2, j = 3, m = 2, n = 4 we have E(X 2:2 |X 3: These two last equations seem to be the simplest unsolved cases. Nevertheless, it can be easily verified that if a sample is taken from a uniform distribution then both the above linearity of regression conditions hold true. Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.