Exact prediction intervals for future exponential and Pareto lifetimes based on ordered ranked set sampling of non-random and random size

In the present paper, two pivotal statistics are suggested to construct prediction intervals of future observations from the exponential and Pareto distributions in the context of ordered ranked set sample. Our study encompasses two cases. The first case, when the sample size is assumed to be fixed and the second case when the sample size is assumed to be a positive integer-valued random variable. In addition to deriving explicit forms for the distribution functions of the two pivotal statistics, we consider some special cases for the random size of the sample. Moreover, a simulation study is carried out to assess the efficiency of the suggested methods. Finally, an example representing lifetime data is analyzed.


Introduction
The idea of ranked set sampling (RSS) introduced by McIntyre (1952) was being as a procedure to improve the precision in estimation of average yield from large plots of arable crops without a large increase in the number of fields from which detailed expensive and tedious measurements needed to be collected. The RSS is a way used to work out the problems associated with getting a non-representative sample from a population. It uses simple random samples (SRS) to obtain a more structural and representative sample from the population and consequently to develop efficient inferential procedures. Takahasi and Wakimoto (1968) provided the mathematical structure for the RSS. Balakrishnan (2007) introduced the concept of order statistics from independent and non-identically distributed (INID) random variables (RVs) to introduce the ordered ranked set sampling (ORSS). Moreover, Balakrishnan and Li (2008) proved that the best linear unbiased estimators based on the ORSS are more efficient than the best linear unbiased estimators based on the RSS for the two-parameter exponential, normal, and logistic distributions. Some Bayesian prediction problems based on the RSS and ORSS were studied by Mohie El-Din et al. (2015. They got the two-sample Bayesian prediction intervals for the observables from the Pareto distribution based on complete and censored data. Lately, there has been considerable development in the statistical literature based on RSS. Various of these works have used the parametric performance of RSS; see, e.g., Adatia (2000), Alodat and Al-Sagheer (2007), Chen et al. (2021), Qian et al. (2021), Sadek et al. (2015), Shaibu and Muttlak (2004), and Stokes (1995). On the other hand, some other works have used the nonparametric performance of RSS; see, e.g., Bhoj (2016), Bohn (1996), Gemayel et al. (2015), and Salehi et al. (2015).
An important obstacle that frequently faces the experimenter in life testing experiments is the prediction of unknown future observations based on current obtainable sample (informative sample). For instance, the experimenters or the manufacturers would like to put bounds for the life of their products so that their guarantee limits could be reasonably set, and customers would like to know the bounds for the life of the product to purchase those manufactured products. Many authors have considered the prediction of future events, especially future order statistics and generalized order statistics, in the life-testing experiments. Among these authors are Aly et al. (2019), Barakat et al. (2014aBarakat et al. ( , 2020Barakat et al. ( , 2021a, Fan et al. (2019), Hsieh (1996, Kaminsky and Nelson (1998), Lawless (2003), Shah et al. (2020), Valiollahi et al. (2017), and Wu et al. (2020).
When it is impossible to predict the size of the sample previously, because some observations continually get lost for several reasons, it is essential to consider that the size of the sample is an integer RV. This phenomenon is frequently found in many biological, agricultural and some quality control problems. Raqab (2001) found prediction intervals for the future generalized order statistics based on the exponential distribution. Barakat et al. (2011Barakat et al. ( , 2018 attained prediction intervals for future exponential lifetimes based on random generalized order statistics. Recently, Barakat et al. (2021b) suggested a new method for constructing an efficient point predictor for the future order statistics when the sample size is a RV. The suggested point predictor is based on some characterization properties of the distributions of order statistics.
Censoring shows an important role in reliability and lifetime experiments when the experimenter cannot observe the lifetimes of all test units. The usual censoring schemes are type-I and type-II censoring which do not allow survival units to be removed from the test at points excluding the terminal point of the experiment. This will be important when a compromise between reduced time of experimentation and the observations of extreme lifetimes are observed. On the basis of a Type-II censored sample, Barakat et al. (2014b) developed two pivotal statistics to construct prediction intervals of future observations from any arbitrary continuous cumulative distribution function (CDF).
Remark 1 Usually, the need of the prediction arises naturally in lifetime tests with type II censored sample as in the usual order statistics and record values. In RSS case we have two scenarios. Consider a random sample of size n 2 of i.i.d RVs, where each of these RVs measures the lifetime of a repairable parallel system. According to the RSS technique, these RVs are divided into n sets (n SRSs) each of them contains n RVs. In the first scenario we apply the Type-II censored sample technique on the sets themselves. For example, we put all these sets independently in a lifetime test and wait to get the first failure "x * 1:n ", then remove the set in which the first failure was appeared. Wait for the second failure to occur among the remaining sets and once we get it "x * 2:n , x * 1:n < x * 2:n ", remove the set in which the second failure was appeared. Continue up to r failure and end the test at this stage to get the observed ORSS x * 1:n ≤ x * 2:n ≤ . . . ≤ x * r :n . In the second scenario, we apply the usual RSS technique to determine the n failed systems (we assumed in RSS technique that these failed systems have the random failure-times X 1(1:n) , X 2(2:n) , . . . , and X n(n:n) ). After repairing all these failed items put them again in the lifetime test with censored sample (at r ) by these way we get r observed ORSS x * 1:n ≤ x * 2:n ≤ . . . ≤ x * r :n , i.e., we first the get the unordered diagonal RVs x 1(1:n) , x 2(2:n) , . . . , and x n(n:n) , and then based on the Type-II censored sample we may get the first r observed ORSS x * 1:n ≤ x * 2:n ≤ . . . ≤ x * r :n . The second scenario was adopted in Kotb (2016) and Mohie El-Din et al. (2017).
The novelty of this article is to apply a general method for predicting future observations from the exponential and Pareto distributions based on the ORSS when the sample size is assumed to be fixed or when the sample size is assumed to be a positive integer-valued RV. The rest of this article is organized as follows: In Sect. 2, the basic setup and description of the lifetime model based on the RSS are presented. In Sect. 3 the distributions of the two suggested pivotal quantities are derived. In Sect. 4, in order to examine the efficiency of the suggested methods, a simulation study is applied on the exponential and Pareto lifetime distributions for fixed sample size and for some cases of random sample size, such as binomial and discrete uniform random sample sizes. In Sect. 5, a real data set is analyzed by using the suggested methods. Finally, Sect. 6 is dedicated to some concluding remarks.

The model description
Assume that a SRS of size n is drawn from the population and is ranked with respect to the variable of interest. Subsequently, the unit with the first order is quantified and the remaining are not determined. Next, the second SRS of a size n is drawn, and the units of the sample are ranked by verdict (judgment), and only the unit with second order is quantified. This procedure is continued until the nth SRS of size n is drawn after that ranked it as well as the unit with order n is quantified. This achieves an observed RSS denoted by x = x 1(1:n) , x 2(2:n) , . . . , x n(n:n) , x i(i:n) ≡ x i:n (i = 1, 2, . . . , n).

The suggested pivotal statistics and their CDFs
In this section, we will utilize the pivotal method to construct prediction intervals for the unknown value of future ORSS x * s:n (s = r +1, r +2, . . . , n) based on the observed values x * i:n (i = 1, 2, . . . , r ). The pivotal method uses a pivotal quantity which is an explicit function of the observed values x * 1:n ≤ x * 2:n ≤ · · · ≤ x * r :n and the unknown future ORSS x * s:n (s = r +1, r +2, . . . , n). Moreover, the pivotal function is invertible and has a well defined CDF, for more details, see Barakat et al. (2014b). The pivotal quantities are essential tools for constructing the prediction intervals for the future ORSS. Namely, if Q is a pivotal quantity for the future ORSS X s:n (s = r +1, · · · , n), then for any 0 < δ < 1, there will exist q 1 and q 2 depending on δ and two functions 1 (x * 1:n , x * 2:n , · · · , x * r :n ) and 2 (x * 1:n , x * 2:n , · · · , x * r :n ) (not depending on x s:n ) such that Consequently, in this case the interval [ 1 , 2 ] is a δ100% predictive confidence interval (PCI) for x s:n . We suggest the following two pivotal quantities based on Exp(θ ) and Pareto(β,α), respectively, by and Remark 2 It is worth noting that the suggested pivotal quantities in (10) and (11) have salient features that they are scale-free and shape-free pivotal quantities, respectively. This fact enables us to use the pivotal quantities defined in (10) and (11) for any Exp(θ ) with an unknown scale parameter, θ, and any Pareto(β, α) with an unknown shape parameter α.

The CDF of the pivotal statistic based on the exponential distribution
In this subsection, we derive the PDF and CDF of the pivotal statistic O RSS r ,s:N based on the Exp(θ ), when the sample size is assumed to be a positive integer-valued RV which is independent on all the lifetimes of the n 2 units in the original n SRSs.
Proof First we note that, the JPDF of x * r :N and x * s: (see Raghunandanan and Patil (1972)), where the JPDF f r ,s:n (x, y) is defined in (8).
We now derive the JPDF of χ r ,s:N = x * s:N − x * r :N and τ = x * r :N by using the standard transformation method Therefore, again by using the standard transformation method, the JPDF of O RSS r ,s:N = χ r ,s:N τ and τ is given by Consequently, the PDF of O RSS r ,s:N is given by where where

The CDF of the pivotal statistic based on the Pareto distribution
In this subsection, we derive the PDF and CDF of the pivotal statistic Φ O RSS r ,s:N based on the Pareto(β, α) when the sample size is assumed to be a positive integer-valued RV. For simplicity, we take β = 1.
Proof By using (9) and (13) (with replacing f r ,s:N and f r ,s:n by g r ,s:N and g r ,s:n , respectively, in (13)), we get We can obtain the JPDF of r ,s:N = ln x * s:N − ln x * r :N and = ln x * r :N by using the standard transformation method Thus, again by using the standard transformation method, the JPDF of Φ O RSS r ,s:N = r ,s:N and is given by If β = 1, the PDF of Φ O RSS r ,s:N is given by   s,n l, j,v,h,υ A l, j,v,h,υ where

Simulation study
In the present section, we perform a Monte Carlo simulation study to examine the performance of the suggested PCIs for future ORSS based on the two important lifetime distributions Exp(θ ) and Pareto(1, α). Our study is carried out for a non-random sample size and the random sample size N , where N is binomially or uniformly distributed. Our goal here is to show the practical importance of the methods presented in the previous  (1) and Pareto(1, 1) (note that, in view of Remark 2, the choice θ = α = 1 does not lose our study its generality), we generate ORSS, each of them has random size N , when the random sample size is assumed to be distributed as: Binomial(7, p) (p = 0.4, 0.7, 0.9) and Uniform(2, 7). The PCs and AIWs of the obtained PCIs are displayed in Tables 1, 2 and 3 for different choices of N .
We compare the performance of the two different pivotal quantities and assess the PCIs in terms of the AIWs and PCs. Generally, the PCIs in all cases of this study are good and acceptable in the sense of the PC and AIW. For example, in the all cases of our simulation study, the PCIs for prediction of the future events in the case of RSS are comparable, based on the PC and AIW, to those of Barakat et al. [7,8] for prediction of the future events in the case of SRS. On the other hand, we found that the increasing r comparing with N (or n) (i.e., decreasing the number of the unobserved (future) observations) yields a shorter AIW. Moreover, the results pertain to the pivotal quantity O RSS r ,s:N are generally better than the results that pertain to the pivotal quantity O RSS r ,s:N (even when P(N = n) = 1). Finally, when N ∼ Binomial(ς, p), the increasing of p generally yields a shorter AIW.

Data analysis (Wooden toy prices)
Here, we present the analysis of real data, which are fitted by the exponential and Pareto distributions, to illustrate and assess the methods described in Sect. 3. All the computations are conducted using Mathematica 12 software. These real data were presented in [Hand et al. (1994), p. 48]. The following, Table 4, are the prices (in £) of the 31 different children's wooden toys on sale in a Suffolk craft shop in April 1991. The quoted source uses them in exercises on graphical presentation of data. For checking whether the exponential and Pareto distributions are appropriate for describing these data, we first note that the Pareto distribution (with β = 1) may only describe this data set if we slightly modified it by discarding the three observations 0.5000, 0.6500 and 0.9000, which have values less than 1, and approximate the value 0.9900 to 1. Therefore, for the exponential distribution we treat with the data given Table 1 The PC and AIW when N s has Binomial(7, p)

Algorithm 1
1: Determine the distribution from which the RSS data will be drawn. 2: Determine the distribution of the random sample size N and select its parameters.
where U j: N s ( j = 1, 2, . . . , N s ) are independent and identical uniform(0,1) RVs.  in Table 4, while for the Pareto distribution we treat with the same data after the preceding modification. We check whether the exponential and Pareto distributions are appropriate for describing these data sets by using three different goodness-of-fit tests, Cramér-von Mises, Pearson χ 2 , and Kolmogorov-Smirnov. Via these tests, we found that Exp( θ = 0.2358) and Pareto ( β = 1, α = 0.8694) are the best fitted distributions for these data sets, see Table 5. We also use the empirical distribution function to check whether the exponential and Pareto distributions are appropriate for describing these data sets. Figures 1 and 2 show, respectively, the empirical distribution function versus the fitted exponential and Pareto survival functions for the given data. Visually, the shown empirical functions for the fitted survival functions are very near, indicating very good fit. Sticking to Scheme 1, a SRS of size 5 is selected randomly without replacement from each of the two data sets of sample size 31 (for the exponential distribution) and 28 (for the Pareto distribution), respectively. The first smallest observation is measured from each of the first two selected sets. Next, the second SRS of size 5 is selected  . 1 The empirical and fitted exponential survival function Fig. 2 The empirical and fitted Pareto survival function randomly without replacement from the remaining data from each of the two data sets, and the second smallest observation is measured from each of the second selected sets. This procedure is continued until the fifth SRS of size 5 is selected randomly without replacement from the remaining data from each of the two data sets, and the largest observation is measured from each of the last selected sets. Therefore, we get the two RSSs (1.12,0.65,1.39,8.69,2.60) and (1.12,6.24,1.74,2.60,5.81) for Exp(0.2358) and Pareto(1,0.8694), respectively. By using the procedure of ORSS in Schemes 2, we predict the last (N − 3) observations (which will be assumed to be unobserved), based on the first three observations when the sample size is supposed to be fixed or random, i.e. x * 4:5 and x * 5:5 when the sample size is fixed and x * s:N (s = 4, ..., N ) when the sample size N is a RV. The quantile values (for γ = 0.05, 0.10), PCIs (for x * s:N , s = 4, ..., N ), and their interval widths (IWs) are constructed based on the exponential and Pareto distributions and displayed in Tables 6 and 7.
In the most cases, the real values of x * s:n and x * s:N (s = 4, 5) lay in the PCIs for the two distributions, see Table 6. Moreover, the results pertain to the exponential distribution are generally better than the results that pertain to the Pareto distribution based on the IW (this may be due to the exponential distribution has a better fitting than the Pareto distribution based on the p-value, as Table 5 shows). Finally, the binomially distribution assumption of the sample size generally gives a better result than the uniform distribution assumption.

Conclusion
By exploiting Theorems 3.1.1, 3.2.1, and their corollaries, a scale-free and shape-free pivotal quantities for the exponential and Pareto distributions were used, respectively, to construct prediction intervals for future observations based on the ORSS. The explicit forms of the PDFs and CDFs of the pivotal quantities were derived. A simulation study was performed to scrutinize the efficiency of the proposed methods for different choices of the random sample size. This simulation study (Tables 1, 2 and 3) revealed that if we fixed the value of s, the average width of the predictive confidence interval of x * s:N decreases, with increasing r . On the other hand, if we fixed the value of r , the average width of the predictive confidence interval of x * s:N increases, with increasing s. Consistently, the average width of the predictive confidence interval of x * s:N decreases with increasing the value of the sample size. In all cases, the average width of the predictive confidence interval of x * s:N based on the pivotal quantity O RSS r ,s:N is less than the corresponding average width of x * s:N based on the pivotal quantity Φ O RSS r ,s:N . In most cases, the average width of the predictive confidence interval of x * s:n for fixed sample size is less than the corresponding average width of x * s:N when the sample size is random. Constantly, the probability coverage is about to 1 − γ. Finally, an example for real lifetime data is analyzed to demonstrate the applicability of the obtained results.