Predicting future failure times by using quantile regression

The purpose of the paper is to study how to predict the future failure times in a sample from the early failures (type II censored data). We consider both the case of independent and dependent lifetimes. In both cases we assume identically distributed random variables. To predict the future failures we use quantile regression techniques that also provide prediction regions for them. Some illustrative examples show how to apply the theoretical results to simulated and real data sets.


Introduction
When in a study one works with a sample of several lifetimes, it is usual to have censored data. Thus, we might have just the exact values of the first r failures (or survival times). The other values are censored (type II censored data). This approach is of interest both in survival and reliability studies. An excellent review of the different situations about ordered and censored data was made in Cramer (2021).
Several tools have been developed to use these censored data. Also, some procedures have been studied to predict the unknown future failure times by using the exact values of the first r early failures. The main results can be seen in Basiri et al. (2016), Barakat et al. (2011), El-Adll (2011), Lawless and Fredette (2005  Recently, Barakat et al. (2022) and Bdair and Raqab (2022) proposed two solutions based on different pivotal quantities for samples of independent and identically distributed (IID) lifetimes with a common mixture of two exponential distributions. The results obtained in the second paper were based on a beta distribution (see next section).
The purpose of this paper is to extend these results in two ways. One way is to consider the IID case with the more general Proportional Hazard Rate (PHR) Cox model. To this end we will use the pivotal quantity proposed by Bdair and Raqab (2022). The other way is to consider also the case of dependent samples. We assume ID lifetimes but the general procedure can be applied to the general case as well (with more complicated expressions). In both cases, to provide such predictions we will use quantile regression (QR) techniques that can also be used to get prediction bands for them, see Koenker (2005) and Navarro et al. (2022). This approach allows us to get an accurate representation of the uncertainty associated to that predictions (especially when we estimate the upper extreme values). Some examples illustrate how to apply the theoretical findings.
The rest of the paper is organized as follows. The main results for the case of independent data are given in Sect. 2 while that for dependent data are in Sect. 3 The examples are placed in Sect. 4. Section 5 contains a simulation study about the coverage probabilities when we estimate the parameter of the PHR model. The conclusions and the main tasks for future research projects are explained in Sect. 6.

Independent data
Let X 1 , . . . , X n be a sample of independent and identically distributed (IID) random variables with a common absolutely continuous distribution function F and with a probability density function (PDF) f = F (a.e.). LetF = 1 − F be the reliability (or survival) function and let X 1:n < · · · < X n:n be the associated ordered data (order statistics). The basic properties for them can be seen in Arnold et al. (2008) and David and Nagaraja (2003).
In many cases X 1 , . . . , X n are lifetimes (or survival times) of some items. So, in practice, sometimes we just have the first r early failure times X 1:n , . . . , X r :n for some r < n. Then what we want is to predict the remaining lifetimes X r +1:n , . . . , X n:n from the early failures.
The results obtained in this section will be based on the following result extracted from Bdair and Raqab (2022) and the well known Markov property of the order statistics, see Arnold et al. (2008). For completeness, we include the proof here.
Proposition 2.1 (Bdair and Raqab 2022) Let W r ,s:n =F(X s:n )/F(X r :n ) for 1 ≤ r < s ≤ n. Then the distributions of the conditional random variables (W r ,s:n | X 1:n = x 1 , . . . , X r :n = x r ) and (W r ,s:n | X r :n = x r ) coincide with a beta distribution of parameters n − s + 1 and s − r.
Proof The distributions coincide from Theorem 2.4.3 in Arnold et al. (2008). From expression (2.4.3) in that book (p. 23), the PDF of (X s:n | X r :n = x r ) is for 1 ≤ r < s ≤ n and x r < x s , where c is the normalizing constant. On the other hand, ifḠ is the reliability function of (W r ,s:n | X r :n = x r ), we get F(x r ) | X r :n = x r = Pr (X s:n < x s | X r :n = x r ) .
Therefore, its PDF g = −Ḡ satisfies and so, by using the preceding expression for f s|r :n , we obtain g(w) = c(1 − w) s−r +1 w n−s for 0 < w < 1. Therefore, (W r ,s:n | X r :n = x r ) has a beta distribution with parameters n − s + 1 and s − r .
The preceding proposition can be used to get the median regression curve to estimate X s:n from X r :n = x (or from X 1:n = x 1 , . . . , X r :n = x r ). It can be stated as follows.
Proposition 2.2 The median regression curve to estimate X s:n from X r :n = x is m(x) =F −1 q 0.5F (x) , (2.1) where q 0.5 is the median of a beta distribution with parameters n − s + 1 and s − r.
Proof From the expressions obtained in the proof of the preceding proposition we get that Pr (X s:n < x s | X r :n = x r ) = 0.5 is equivalent toḠ whereḠ is the reliability function of a beta distribution with parameters n − s + 1 and s − r . This expression leads toF(x s ) = q 0.5F (x r ). Therefore, the first expression is equivalent to x s =F −1 (q 0.5F (x r )) which gives the expression for the median regression curve.

Remark 2.1
The median q 0.5 of the beta distribution with parameters α = n−s +1 > 1 and β = s − r > 1 can be approximated by By using this approximation, the mean regression curve obtained in (2.1) can be written as for r + 1 < s < n. If we prefer to use the exact value of the median, we can use any statistical program to compute q 0.5 . For example, the code in R to compute it is qbeta(0.5,n-s+1,s-r). Remark 2.2 Instead of the median approach, we could use the mean or the mode of the conditional random variable (W r ,s:n | X r :n = x r ) in Proposition 2.1. To use the mean, as it has a beta distribution, we get E(W r ,s:n | X r :n = x r ) = n − s + 1 n − r + 1 , that is, Therefore, X s:n can be predicted from X r :n = x with the curve Note that this curve can be different to the classical regression curve E(X s:n | X r :n = x r ). Analogously, if we use the mode of the beta distribution (i.e. the maximum likelihood estimator for W r ,s:n ) we get Mode(W r ,s:n | X r :n = x r ) = n − s n − r − 1 for s < n and s > r + 1 (see p. 219, Johnson et al. 1995), that is, X s:n can be predicted from X r :n = x with the curve In the case s = r + 1 with r < n − 1, the mode is obtained in the boundary and we can use this predictor with m mode (x) = x. In the case r + 1 < s = n, the predictor m mode (x) =F −1 (0) coincides with the right-end point of the support (we could use it when it is finite). Finally, in the case r + 1 = s = n the mode is not unique since (W r ,s:n | X r :n = x r ) has a uniform distribution in (0, 1) and we can replace this predictor with m mode (x) =F −1 (0.5F(x)) (which coincides with the mean estimator).
The same approach can be used to determine prediction bands for these future failure times from the quantiles of a beta distribution. Thus, if we want to get a prediction interval of size γ = β − α, where α, β, γ ∈ (0, 1) and q α and q β are the respective quantiles of the above beta distribution, then we use that (2.2) For example, the centered 90% prediction band is obtained with β = 0.95 and α = 0.05 as Sometimes one needs bottom (or lower) prediction bands starting at X r :n = x. For example, the bottom 90% prediction band is obtained with β → 1 and α = 0.1 as As we will see in the examples provided in Sect. 4, these prediction bands represent better the uncertainty in the prediction of X s:n from X r :n . In particular the area of these bands will increase with s −r (as expected). It is worth mentioning that the quantiles q z of a beta distribution (including the median) are available in many statistical programs (for example, in R, q z is obtained with the code qbeta(z,a,b), with a = n − s + 1 and b = s − r in our case). These quantiles could also be obtained from the procedure given in Van Dorp and Mazzuchi (2000). Moreover, in the following proposition we show that the exponential distribution is characterized in terms of its quantile regression curve. It is the unique distribution with quantile regression curves that are lines with slope equal to one.

Proposition 2.3
Let X be a random variable with support (0, ∞), with absolutely continuous reliability functionF and with quantile regression curve of order α m α (x) =F −1 (q αF (x)). Then, for all x > 0 and all α ∈ (0, 1), where c α is a positive constant depending on α if, and only if, X is exponentially distributed.
In practice, the common reliability functionF is unknown. In some cases, it can be estimated from historical and complete data sets by using non-parametric estimators (e.g. we could use empirical or kernel estimators). In those cases, we just replace in the preceding expressions the exact unknown reliability functionF with its estimation.
In other cases, we have a model for it. SayF isF θ with an unknown parameter θ . In those cases, we can use X 1:n = x 1 , . . . , X r :n = x r to estimate θ . The associated likelihood function is Arnold et al. 2008 or (5) in Bdair and Raqab 2022). By maximizing this function we get a good estimator for θ .
For example, we can assume the useful Proportional Hazard Rate (PHR) Cox model withF θ =F θ 0 , whereF 0 is a known baseline reliability function and θ > 0 is an unknown risk parameter. In this case, and so L(θ ) = log (θ ) can be written as where K is a term that does not depend on θ . Hence, its derivative is and the maximum likelihood estimator (MLE) for θ is .
(2.4) Thus, to get the point predictions and the prediction bands for X s:n , we just replace in the preceding expressionsF θ withF θ . In the simulation study developed in Sect. 5, we will see the real coverage probabilities for the prediction bands obtained in this way.
Some well known distributions are included in the wide PHR model. For example, ifF 0 (t) = e −t for t ≥ 0 (exponential model), then (2.4) leads to Analogously, the mean μ = 1/θ can be estimated with (a well known result, see e.g. Balakrishnan et al. 2007or Cramer 2021. It can be written as μ = S r /r , where S r = (n − r )x r + x 1 + · · · + x r is known as the total time on test (TTT), see e.g. Khaminsky and Rhodin (1985) or David and Nagaraja (2003, p. 209). The estimator obtained in that reference for μ by using the maximum (mode) of the joint likelihood function of μ and X s:n at X r :n isμ = S r /(r + 1). Thus the prediction obtained in Khaminsky and Rhodin (1985) for X s:n is Note that if s = r + 1, thenm(x 1 , . . . , x r ) = x r (i.e. the maximum is obtained in the boundary). Note that this prediction does not belong to the centered prediction intervals.
From (2.6) and Remark 2.2, we get the following (estimated) median regression curve to predict X s:n in the exponential distribution with unknown mean m(x 1 , . . . , x r ) = x r − S r r log(q 0.5 ) ≈ x r + S r r log n − r + 1/3 n − s + 2/3 for r + 1 ≤ s ≤ n in the first expression and r + 1 < s < n in the second. Similar curves can be obtained from Remark 2.2 by using the mean for r < s ≤ n or the mode for r < s < n. Another classical point predictor for X s:n in the one parameter exponential family is the Best Linear Unbiased (BLU) predictor given by see, e.g., David and Nagaraja (2003, p. 209). These predictors are compared in Example 1. Analogously,F 0 (t) = 1/(1 + t) for t ≥ 0 (Pareto type II model), leads to . (2.9) We can use this estimator for θ to get the different regression curves as in the exponential case (see the examples in Sect. 4). Finally, we note that the prediction regions obtained from different quantiles can be used to get bivariate box plots and goodness-of-fit tests for the assumed reliability functionF (orF θ ), see Navarro (2020). For example, to get equal expected values as recommended in Greenwood and Nikulin (1996), we could consider the regions: Note that ifF is correctly specified, then Pr(X s:n ∈ R i | X r :n = x) = 1/4 for i = 1, 2, 3, 4.
If we have many values for X r :n and X s:n we could consider these regions for these fixed values of r , s and n. If we just have some values from a sample of size n, we could consider the regions for different values of r and s (see Sect. 4). Resampling methods could be used as well. In all these cases we use the Pearson statistic where O i and E i are the observed and expected data in each region, respectively, with E i = N /4 if we have N observations and we assume thatF is correct (null hypothesis). Under this assumption, the asymptotic distribution of T as N → ∞ is a chi-squared distribution with 3 degrees of freedom (2 if we use the MLE of the parameter θ ). Under this null hypothesis, the associated P-value will be Pr(χ 2 3 > T ). Some illustrative examples will be provided later.

Dependent data
In this case we assume that we have a sample X 1 , . . . , X n of ID random variables with a common distribution function F but now we also assume that the data might have some kind of dependency. In many cases, this dependency is due to the fact that they share the same environment (for example, when they are components of the same system). The dependency will be modeled with a copula function C that is used to write their joint distribution as for all x 1 , . . . , x n .
We assume that F is absolutely continuous and we again consider the ordered data X 1:n < · · · < X n:n obtained from X 1 , . . . , X n . However, in practice, we just have the first r values X 1:n < · · · < X r :n and we want to predict X s:n for s > r .
Under dependency, the order statistics do not satisfy the Markov property, that is, the distributions of (X s:n | X 1:n = x 1 , . . . , X r :n = x r ) and (X s:n | X r :n = x r ) do not coincide. To get bivariate plots, we shall use the second one, that is, we will predict X s:n from X r :n = x for r < s. The other data can just be used to estimate the unknown parameters in the model. In this case, we might have unknown parameters both in F and in C. Also note that, we can obtain and compare different predictions for X s:n by using X 1:n , . . . , X r :n .
To get these predictions, we will use a distortion representation for the joint distribution of the random vector (X r :n , X s:n ) as proposed in Navarro et al. (2022). Actually, the results for the case n = 2, r = 1 and s = 2 (paired ordered data) were already obtained there. Similar results were obtained in Navarro (2022b) for record values. The procedure is similar but, the expressions obtained here for n = 4 will be more complicated. They are based on the following two facts.
The first one is that there exists a distortion function D (which depends on r , s, n and C), such that the joint distribution G r ,s:n of (X r :n , X s:n ) can be written as G r ,s:n (x, y) = Pr(X r :n ≤ x, X s:n ≤ y) = D r ,s:n (F(x), F(y)) for all x, y. The distortion function D r ,s:n is a continuous bivariate distribution function with support included in the set [0, 1] 2 . Note that this representation is similar to the classical copula representation but that here D r ,s:n is not a copula and that F does not coincide with the marginal distributions (i.e. the distributions of X r :n and X s:n ).
The second fact is that, from the results obtained in Navarro et al. (2022), we can obtain the median regression curve and the associated prediction bands to predict X s:n from X r :n . The result can be stated as follows. It is Proposition 7 in Navarro et al. (2022). Throughout the paper we use the notation ∂ i G for the partial derivative of function G with respect to its ith variable. Similarly, ∂ i, j G represents the partial derivatives of G with respect to its ith and jth variables, and so on.
Proposition 3.1 If we assume that both F and D r ,s:n are absolutely continuous, then the conditional distribution of X s:n given X r :n = x is Sometimes, it is better to use the reliability functions instead of the distribution functions (see examples). We have similar results for them. For example, the joint reliability function of X 1 , . . . , X n can be written as for all x 1 , . . . , x n , whereF = 1 − F andC is another copula called survival copula. C can be obtained from C (and vice versa).
Analogously, the joint reliability function ofḠ r ,s:n of (X r :n , X s:n ) can be written asḠ for all x, y. The distortion functionD r ,s:n is also a continuous bivariate distribution function with support included in the set [0, 1] 2 . It depends on r , s, n and C (orC). From this expression, the conditional reliability function can be obtained as follows.
Proposition 3.2 If we assume that bothF andD r ,s:n are absolutely continuous, then the conditional reliability function of X s:n given X r :n = x is The preceding expressions can be used to solve the general case in which we want to predict X s:n from X r :n for 1 ≤ r < s ≤ n. To show the procedure, we choose different cases for n = 4. In all these cases we assume that the joint distribution of (X 1 , X 2 , X 3 , X 4 ) is exchangeable (EXC), that is, it does not change when we permute them. This is equivalent to the assumption that they are ID and C (orC) is exchangeable.
In the first case, we choose r = 1 and s = 2. The result can be stated as follows.
Proposition 3.3 If bothF andC are absolutely continuous andC is EXC, then the conditional reliability function of X 2:4 given X 1:4 = x is Proof The joint reliability functionḠ 1,2:4 of (X 1:4 , X 2:4 ) satisfies Analogously, for x < y, we get where X P i = min j∈P i X j and P 1 , . . . , P r are all the minimal path sets of X 2:4 (see, e.g., Navarro 2022a, p. 23). In this case they are all the subsets of {1, 2, 3, 4} with cardinality 3. So r = 4 3 = 4. Hence, by applying the inclusion-exclusion formula and by using the exchangeable assumption we get Finally, we use (3.3) to get (3.4).
In Example 4 we show how to apply this result to a specific case. In the following propositions we provide the expressions for the other cases. As the proofs are similar, they are omitted. Note that in Proposition 3.7 we use (3.1) instead of (3.3).

Proposition 3.4 If bothF andC are absolutely continuous andC is EXC, then the conditional reliability function of X 3:4 given X
Proposition 3.5 If bothF andC are absolutely continuous andC is EXC, then the conditional reliability function of X 4:4 given X 1:4 = x is Proposition 3.6 If bothF andC are absolutely continuous andC is EXC, then the conditional reliability function of X 4:4 given X 2:4 = x is Proposition 3.7 If both F and C are absolutely continuous and C is EXC, then the conditional distribution function of X 4:4 given X 3:4 = x is We conclude this section by showing how to get predictions from more than one early failures. For example, if we want to predict X 3:4 from X 1:4 = x and X 2:4 = y, we need a distortion representation for their joint reliability function as whereF is the common reliability function of X 1 , X 2 , X 3 , X 4 . Then their joint PDF is where f = −F . Analogously, the joint reliability function of X 1:4 and X 2:4 is G 1,2 (x, y) = Pr(X 1:4 > x, X 2:4 > y) =D(F(x),F(y), 1) and their joint PDF Hence, the PDF of (X 3:4 | X 1:4 = x, X 2:4 = y) is and the reliability function is A straightforward calculation shows that if the survival copulaC is EXC, then and and The prediction is obtained by solving (numerically)Ḡ 3|1,2 (t | x, y) = 0.5 for given values of x and y. The prediction intervals are obtained in a similar way. Proceeding as above we can also obtain the expression to predict X 4:4 from X 1:4 = x and X 2:4 = y as

Examples
First we illustrate the IID case with simulated samples. In the first one we assume an exponential baseline distribution. If we want to predict X 2:20 from X 1:20 by assuming thatF is known (or that it is estimated from a preceding sample), we use the quantile regression curve given in (2.1) where q 0.5 = 0.96418 is the median of a beta distribution with parameters n − s + 1 = 19 and s − r = 1. Thus, we get the prediction for X 2:20 as The real value is X 2:20 = 0.02454. The 90% and 50% prediction intervals for this prediction are obtained from (2.2)  To see better what happens with these predictions we simulate N = 100 predictions of this kind, that is, 100 samples of size 20. The data are plotted in Fig. 1, left. There we can see that the prediction bands represent very well the dispersion of the majority of data (except some extreme values). In this sample, C 50 contains 51 values and C 90 contains 90 while 5 values are above the upper limit and 5 are below the bottom limit. Of course, if we test H 0 :F is correct vs H 0 :F is not correct, by using the four regions R 1 , R 2 , R 3 , R 4 considered in Sect. 2, we get the observed values: 25, 30, 21, 24 and the T statistic value is Thus, the P-value P = Pr(χ 2 3 > 1.68) = 0.64139 leads to the acceptance of the exponential distribution (as expected). Note that, in practice, it is not easy to perform this test because we need the first two values of several samples with the same size (n = 20 in this example). In this case, we could also use the standard quantile regression techniques (see Koenker 2005).
We do the same in Fig. 1, right, but for n = 20, r = 12 and s = 13. If we just use the initial sample, the prediction for X 13:20 = 1.71278 from X 12:20 = 1.54357 obtained with the median curve m(x) =F −1 (q 0.5F (x)) = x − log(q 0.5 ) = x + 0.08664  Fig. 1, right, show the underlying uncertainty for that predictions. We remind that for the exponential distribution all the curves are lines with slope one (see Proposition 2.3). In this case, we have 92 values in C 90 , 56 in C 50 and 8 values out of C 90 (4 above and 4 below). The T statistic is 2.24 and its associated P-value 0.52411 leads again to accept the (true) distribution F. The predictions will be worse for more distant future values (i.e., the dispersion will be greater). To show this fact, in Fig. 2 we plot the prediction bands for r = 12, s = 14 (left) and s = 20 (right). However, of course, the coverage probabilities of these regions will be the same. The predictions obtained in the initial sample are X 14:20 = 1.76813 and X 20:20 = 4.03254, with prediction intervals C 90 = [1.59107, 2.17974] and C 90 = [2.70722, 6.59642], respectively. The real values are X 14:20 = 2.22949 and X 20:20 = 7.68743. Both values are out of the interval C 90 . This situation is unexpected because, in this simulation, we get two of two failures in the prediction intervals. However, note that these events are not independent and that this is not the case in the other simulations.
In Fig. 3 we plot the predictions for X s:20 (red line) jointly with the limits of the 90% (dashed blue lines) and 50% (continuous blue line) prediction intervals in the initial simulated sample from X 12:20 (left) for s = 13, . . . , 20 and from the preceding data X s−1:20 (right) for s = 2, . . . , 20. In the left plot 2-out-of-8 exact points do not belong to the 90% prediction intervals while 5 are out of the 50% prediction intervals (the expected values are 8 · 0.1 = 0.8 and 8 · 0.5 = 4, respectively). In the right plot 4-out-of-19 points do not belong to 90% prediction intervals while 11 do not belong to the 50% prediction intervals (we expect 1.9 and 9.5, respectively). Now, let us compare our method with the one based on the maximum likelihood predictions (MLP) proposed in Khaminsky and Rhodin (1985). If we know the exact In particular, for an exponential distribution this method leads to the prediction X s:n = x r + μ log n − r n − s + 1 for X s:n at X r :n = x r when the mean μ is known. We can readily observe that if s = r + 1, the MLP method suggests to predict X r +1:n with the value assumed by the r -th order statistic.
For illustrative purposes, we consider again the cases r = 12 and s = 14 or s = 20. By using the MLP method with μ = 1 the predictions are X 13:20 = X 12:20 = 1.54357, X 14:20 = X 12:20 + log 8 7 = 1.67710, X 20:20 = X 12:20 + log 8 = 3.62301, which are all worse than the ones obtained by using the method proposed here (1.63021, 1.76813 and 4.03254, respectively). The same conclusion holds in the other points, that is, for s = 15, . . . , 19. However, in this case, the curve provided by the mean gives better predictions for s = 13, . . . , 16 (and worse for s = 17, . . . , 20). In practice, we do not know the exact reliability function. If we just assume the exponential modelF(t) = e −θt for t ≥ 0, with an unknown parameter θ > 0, we can use (2.5) to estimate θ . With the above sample and r = 12 we get The exact value is θ = 1. The estimation for the mean is μ = 1/ θ 12 = 1.608017, Replacing the exact reliability functionF(t) = e −t withF θ (t) = e −0.62188·t , we can obtain similar predictions for X s:20 from X 12:20 as that obtained above. For example, for s = 13 we get the prediction X 13:20 = 1.6829 for X 13:20 = 1.71278. The estimated prediction intervals for this prediction are C 90 = [1.55388, 2.14572] and C 50 = [1.60140, 1.82222]. Both intervals contain the exact value. However, in this case, as we estimate the parameter, we do not know the exact coverage probabilities for these intervals (see Sect. 5). The predictions from X 12:20 for X s:20 and s = 13, . . . , 20 are plotted in Fig. 4, left. The predictions for s = 14 and s = 20 are X 14:20 = 1.904668 and X 20:20 = 5.545868. The blue lines represent the prediction intervals. Note that all the exact values belong to the 90% intervals (dashed blue lines) and that three of them do not belong to the 50% intervals (blue lines). We can compare these predictions with the ones obtained from the method proposed in Khaminsky and Rhodin (1985) or that obtained from the mean, the mode or the BLU (see Remark 2.2). In the first case we obtain the predictions X 13:20 = X 12:20 = 1.54357, X 14:20 = X 12:20 + 9X 12:20 + 11 i=1 X i:20 13 log 8 7 = 1.74178, X 20:20 = X 12:20 + 9X 12:20 + 11 i=1 X i:20 13 log 8 = 4.63014.  Fig. 4 Predictions (red) for X s:n from X r :n for n = 20, r = 12 and s = 13, . . . , 20 (left) and r = 12, . . . , 19 and s = r + 1 (right) for the exponential distribution in Example 1 with estimated θ at X r :20 . The black points are the exact values and the blue lines are the limits for the 50% (continuous lines) and the 90% (dashed lines) prediction intervals (colour figure online) These predictions are worse than that obtained with the median. However, for the other values, now the predictions for s = 17 and s = 18 are better (and that for s = 15, 16 and s = 19 are worse). In the second case, the predictions obtained with the mean by using (2.7) are better than that obtained with the median for s = 13, . . . , 20 and better to that obtained with the method proposed in Khaminsky and Rhodin (1985) Fig. 4, right, we provide the predictions for X r +1:20 from all the preceding values X 1:20 , . . . , X r :20 for r = 12, . . . , 19 which are used to estimate θ . The estimations θ r obtained for θ at X r :20 and the predictions for X s:20 are given in Table 1. Note that the estimations for θ are similar. The MLE for θ from the complete sample (which is not available under our assumptions) is θ 20 = 20/(X 1 +· · ·+ X 20 ) = 0.6113249 which is very similar to our estimations for r ≥ 12 (remember that the exact value is θ = 1). In practice, when we work with real data, the stability of these predictions might confirm the assumed parametric model. Also note that all the exact values belong to the 90% prediction intervals but that 4 of them do not belong to the 50% prediction intervals (as expected). Surprisingly, in this case, the estimations obtained from X 12:20 seem to be better than that obtained from the preceding values but note that the lengths of the intervals in the first case are greater than the ones obtained in the second.
In the following example we perform a similar study by assuming a Pareto type II baseline distribution in the PHR model.

Example 2
We simulate a sample of size n = 20 from a Pareto type II distribution with parameter 2, i.e., with reliability functionF(t) = 1/(1 + t) 2 for t ≥ 0. The ordered (rounded) sample values obtained are 0.01352 0.04743 0.05927 0.07542 0.16497 0.17561 0.22626 0.25210 Table 1 Predicted values X r +1:n and estimated centered prediction intervals C 50 = [l r , u r ] and C 90 = [L n , U n ] for X r +1:n from X r :n in a standard exponential distribution; θ r is the estimate of θ based on X 1:n , . . . , X r :n for r = 12, . . . , 19 r θ r L r l r X r +1:n X r +1:n u r U r If we want to predict X 2:20 from X 1:20 by assuming thatF is known, we use the quantile regression curve given in (2.1) to obtain To see better what happens with these predictions we simulate N = 100 predictions of this kind. The data are plotted in Fig. 5  We do the same in Fig. 5, right, but for n = 20, r = 12 and s = 13. Now the prediction for X 13:20 = 0.59333 from X 12:20 = 0.46332 obtained with the median curve is 0.52811, where q 0.5 = 0.91700 is the median of a beta distribution with parameters n −s +1 = 8 and s −r = 1. The prediction intervals for this prediction are C 90 = [0.46802, 0.76464] and C 50 = [0.48988, 0.59577]. Both intervals contain the real value. The 100 repetitions of this case plotted in Fig. 5, right, show the underlying uncertainty for our predictions. In this case, we have 93 values in C 90 , 55 in C 50 and 7 values out of C 90 (1 above and 6 below). The T statistic is 3.6 and its associated P-value 0.30802 leads again to accept the (real) distribution F. The predictions will be worse for other future values. In Fig. 6 we plot the prediction bands for r = 12 s = 14 (left) and s = 20 (right). However, of course, the coverage probabilities of these regions will be the same. The predictions obtained are X 14:20 = 0.63721 and X 20:20 = 4.07940, with 90% prediction intervals C 90 = [0.49850, 1.01132] and C 90 = [1.61833, 17.30424], respectively. The real values are X 14:20 = 0.80563 and X 20:20 = 6.05006. Both values belong to the corresponding interval but the second interval is really wide.
In Fig. 7 we plot the predictions for X s:20 (red line) jointly with the limits of the 90% (dashed blue lines) and 50% (continuous blue line) prediction intervals in the initial simulated sample from X 12:20 for s = 13, . . . , 20 (left) and from the preceding value X s−1:20 (right) for s = 2, . . . , 20. In the left plot all the 8 exact points are in the 90% prediction intervals while 3 are out of the 50% prediction intervals (the expected values are 0.8 and 4, respectively). In the right plot 2-out-of-19 points do not belong to the 90% prediction intervals while 7 do not belong to the 50% (we expect 1.9 and 9.5, respectively). Now, let us consider a more realistic scenario by just assuming the Pareto type II modelF(t) = 1/(1 + t) θ for t ≥ 0, with an unknown parameter θ > 0. We can use (2.9) to estimate θ and, with the above sample and r = 12, we get = 2.28120.
The exact value is θ = 2. Replacing the exact survival functionF(t) = 1/(1+t) 2 with F θ (t) = 1/(1 + t) 2.28120 we can obtain similar predictions for X s:20 from X 12:20 as the ones obtained above. For example, for s = 13 we get the prediction X 13:20 = 0.51997 for X 13:20 = 0.59333. The estimated prediction intervals for this prediction are C 90 = [0.46744, 0.72438] and C 50 = [0.48658, 0.57882]. The exact value belongs to C 90 and it is out of C 50 . However, in this case, as we estimate the parameter, we do not know the exact coverage probabilities for these intervals. In order to estimate them, we will perform a simulation study in Sect. 5. The predictions from X 12:20 for X s:20 and s = 13, . . . , 20 are plotted in Fig. 8, left, where the blue lines represent the prediction  intervals. Note that all the exact values belong to the 90% intervals (dashed blue lines) and that six of them do not belong to the 50% intervals (continuous blue lines). We do the same in Fig. 8, right, but, in this case, X r +1:20 is estimated from the preceding value X r :20 by using all the preceding data values X 1:20 , . . . , X r :20 to estimate θ . The estimations θ r for θ at X r :20 and the predictions for X r +1:20 with r = 12, . . . , 19 are given in Table 2. Note that the estimations obtained for θ are similar. The MLE for θ from the complete sample (which is not available under our assumptions) is θ 20 = 1.98527 (remember that the exact value is θ = 2). In practice, when we work with real data, the stability of these predictions might confirm the assumed parametric model. Also note that 2 of the exact values do not belong to the 90% prediction intervals and that 4 of them do not belong to the 50% prediction intervals (we expect 0.1 and 4).
In the following example we analyze a real data set by assuming that the original (not ordered) data values are independent (the ordered values are always dependent).
Example 3 Let us study the real data set considered in Bdair and Raqab (2022 Let us assume that we just know the first 12 failure times and that we want to predict the future failures. If we assume an exponential distribution with unknown failure rate θ , then we estimate it from (2.5) at X 12:20 as From this value we obtain the point predictions and prediction intervals given in Fig. 9, left. For example, the prediction for X 13:20 = 2.38 is X 13:20 = 2.13834 with prediction intervals C 90 = [1.95468, 2.797216] and C 50 = [2.02232, 2.33668]. The exact value belongs to C 90 but not to C 50 . As we can see, the predictions for the last values are not very good. However, all the exact values belong to the 90% prediction intervals and only 4-out-of-8 of them do not belong to the 50% prediction intervals (as expected). Note that this plot is similar to the plot obtained in Fig. 3, left, with a sample of size 20 from an exponential distribution. If we count the data in the four regions R 1 , R 2 , R 3 , R 4 defined in Sect. 2, we get the observed data 3, 3, 1, 1 and the Pearson T statistic value is We can approximate its distribution with a chi-squared distribution with 2 degrees of freedom (since we estimate a parameter), and then its associated P-value is Pr(χ 2 2 > 2) = 0.36788. So the exponential model cannot be rejected with the complete sample (by using this test).
To check the model with the first 12 values we could estimate θ and predict X r +1:20 from X r :20 for r = 1, . . . , 11. The predictions can be seen in Fig. 9, right. The estimations θ r for θ at X r :20 for r = 1, . . . , 12 are These estimations for θ are stable from r = 5 to r = 12. The MLE estimation with the complete sample is θ = 0.51666. As we can see in that figure the predictions are accurate. Two and six exact points do not belong to the 90% and 50% prediction intervals, respectively. These numbers are close to the expected values (0.1 · 11 = 1.1 and 0.5 · 11 = 5.5). So the exponential model cannot be rejected by using these first 12 values (the P-value obtained with the four regions in Sect. 2 is 0.20374). In the same framework, if we assume a Pareto type II distribution with unknown parameter θ , then we estimate it from (2.9) as From this value we obtain the predictions and prediction intervals given in Fig. 10, left. The prediction obtained with the Pareto model for X 13:20 = 2.38 is X 13:20 = 2.30358. However, the predictions for the last values are very bad. In fact, 4 of the exact values do not belong to the 90% prediction interval and only 1 of them belongs to the 50% prediction interval. If we count the data in the four regions R 1 , R 2 , R 3 , R 4 defined in Sect. 2, we get the observed data 7, 0, 1, 0 and the Pearson T statistic value is By using again a chi-squared distribution with 2 degrees of freedom, its associated P-value is Pr(χ 2 2 > 17) = 0.00020 and so the Pareto type II model is rejected with the complete sample. Let us see what happens if we check the model with the first 12 values by estimating θ and predicting X r +1:20 from X r :20 for r = 1, . . . , 11. The predictions can be seen in Fig. 10 In the figure, we can see that one and seven observed points do not belong to the 90% and 50% prediction intervals, respectively. In this case, the P-value obtained with the four regions is 0.06843 and the Pareto type II model could be rejected by using these 12 values.
We conclude this section with an example of four dependent data values. They can represent the values in a small sample but they could also be the lifetimes of the four engines in a plane. In the last case, it is very important to predict the future failure times! Example 4 First we consider the case r = 1, s = 2 and n = 4, that is, we want to predict X 2:4 from X 1:4 = x. Let us assume that (X 1 , X 2 , X 3 , X 4 ) has the following Farlie-Gumbel-Morgenstern (FGM) survival copulā for u 1 , u 2 , u 3 , u 4 ∈ [0, 1] and θ ∈ [−1, 1]. The independent case is obtained when θ = 0. Then for all x. Hence, from (3.4), we get for all x ≤ y such thatF 3 (x) + θF 3 (x)F 3 (x)(1 − 2F(x)) > 0.  Fig. 11 Predictions (red) for X s:n from X r :n for n = 4, r = 1 and s = 2 for θ = −1, 0, 1 (left) jointly with the values (black point) from 100 simulated samples from a standard exponential distribution and an FGM survival copula with θ = 1 (see Example 4). The blue lines represent the limits for the 50% (continuous lines) and the 90% (dashed lines) prediction intervals. The green lines are the curves for the independent case. In the right plot we have the curves when the mean of the exponential distribution is estimated from the minimum X 1:4 in the first sample (colour figure online) Unfortunately, we cannot obtain an explicit expression for its inverse. However, we can plot the level curves of this function to get the plots of the median regression curve (level 0.5) and the limits of the centered prediction regions C 90 (levels 0.05, 0.95) and C 50 (levels 0.25, 0.75). They are plotted in Fig. 11, left, jointly with the values obtained from 100 samples of size n = 4 from a standard exponential distribution and an FGM survival copula with θ = 1. The (rounded) ordered values obtained in the first sample are 0.07086 0.32313 0.88360 1.66760.
The method used to generate these sample values is explained in the Appendix. Note that the data values are perfectly represented by these prediction regions. In the right plot we also provide the curves for θ = 0 (green lines) and θ = −1. As we can see the changes are really minor. This is due to the fact that the FGM copula gives a weak dependence relation. The curves might be more different in other dependence models (copulas).
If we assume that the parameters in the model are unknown, we might try to estimate them. Taking into account the preceding comments, instead of estimate θ , we could just plot the curves for the extremes values θ = −1, 1. The exact curves will be between them. So we just need to estimate the parameter λ = 1 of the exponential distribution. For this purpose, in practice, we have just the sample minimum X 1:4 . Its reliability function is F 1:4 (t) =C(F(t),F(t),F(t),F(t)) =F 4 (t) + θF 4 (t)F 4 (t) for t ≥ 0. Hence, ifF(t) = exp(−t/μ) with μ = 1/λ, then the mean of X 1:4 is Therefore, μ can be estimated with For θ = 1, we get μ = 3.94366X 1:4 and for θ = −1, μ = 4.05797X 1:4 . In our first simulated sample, we obtain the value X 1:4 = 0.07086 and so μ ∈ [0.27945, 0.28755].
As we are assuming that θ is unknown, we can use the average of these two estimations to approximate μ with 0.2835. By using this value, we get the curves plotted in Fig. 11, right. Although the estimation for μ = 1 is very bad (since we just have one data point) and the curves are far from the observed sample values (plotted in Fig. 11, left), note that the value X 2:4 belongs to the 90% prediction interval obtained from X 1:4 . Along the same lines and by using the same copula, we can consider other cases. If we want to predict X 3:4 from X 1:4 = x, by using (3.5), we get As in the preceding case, we plot the level curves of this function to get the plots of the median regression curve (level 0.5) and the limits of the centered prediction regions C 90 (levels 0.05, 0.95) and C 50 (levels 0.25, 0.75). They are plotted in Fig. 12, left, jointly with the values from 100 samples of size n = 4 from a standard exponential distribution and an FGM survival copula with θ = 1. In this case we do not plot also the curves for θ = 0 and θ = −1 since the changes are again really minor.
Furthermore, if we want to predict X 4:4 from X 1:4 = x, by using (3.6), we get for all x ≤ y such thatF 3 ( We plot the level curves of this function to get the plots of the median regression curve (level 0.5) and the limits of the centered prediction regions C 90 (levels 0.05, 0.95) and C 50 (levels 0.25, 0.75). They are plotted in Fig. 12, right, jointly with the values from 100 samples of size n = 4 from a standard exponential distribution and an FGM survival copula with θ = 1. Again, we do not plot the curves for θ = 0 and θ = −1 since the changes are again really minor.
Proceeding as above, we can predict X 2:4 , X 3:4 and X 4:4 by using the median regression curve and we can obtain the limits of the centered prediction regions C 90 and C 50 . For example, in the first sample, the prediction obtained for X 2:4 = 0.32313 from X 1:4 = 0.70086 is X 2:4 = 0.29708 with prediction intervals C  Fig. 13, left, we plot these predictions (red) for X 2:4 , X 3:4 , X 4:4 from X 1:4 jointly with the exact values (black points) in the first simulated sample from a standard exponential distribution and an FGM survival copula with θ = 1.

Simulation study
In this section we show a simulation study to estimate the coverage probabilities of the prediction regions when we estimate the parameter in the PHR model for samples of IID random variables. First, let us assume the exponential model with parameter θ = 1,F(t) = e −t for t ≥ 0. We generate N samples of size 20 and, by supposing that the parameter θ > 0 is unknown, we use X 12:20 to predict X s:20 , s = 13, 14, 20. For each sample we use (2.5) with r = 12 to estimate θ , and we obtain θ j , j = 1, . . . , N . Replacing the exact survival functionF(t) = e −t withF θ j (t) = e − θ j , j = 1, . . . , N , we can obtain predictions for X s:20 from X 12:20 for each simulated sample. Our purpose is to estimate the coverage probabilities for the estimated prediction intervals C 90 and C 50 varying s and N . The results are listed in Table 3.
Furthermore, we perform the same study by choosing as baseline distribution for the PHR model the Pareto type II distribution, that is,F(t) = 1/(1 + t) θ for t ≥ 0. We choose θ = 2. For each sample we use (2.9) with r = 12 to estimate θ , and we obtain θ j , j = 1, . . . , N . Replacing the exact reliability functionF(t) = 1/(1 + t) 2 withF θ j (t) = 1/(1 + t) θ j , j = 1, . . . , N , we obtain predictions for X s:20 from X 12:20 for each simulated sample. The results about the coverage probabilities of the estimated prediction intervals are given in Table 4. As we can see, in both cases, the coverage probabilities are a little bit below of the expected values (for the exact model), especially when we predict the last value X 20:20 from X 12:20 . Note that in both models (exponential and Pareto), we have some extreme upper values (especially in the Pareto model).

Conclusions
We have proved that quantile regression techniques can capture the underlying uncertainty in the prediction of future sample values. We consider both the cases of IID and DID samples. In both cases, if we know the underlying model (or it can be estimated from preceding right censored samples), the predictions are accurate. Even more, we can perform fit tests to confirm the model. If the model contains unknown parameters, then they should be estimated from the available values. In those cases, the predictions are not so accurate. However, the coverage probabilities obtained in the simulation study when the parameters are estimated are close to the expected ones. In the IID case we also provide some point predictors based on the median, the mean or the mode, that are compared with classical estimators. However, we recommend to use prediction intervals instead of point predictions since they represent better the uncertainty in the future failure times.
There are several tasks for future research. The main one could be the estimation of the unknown parameters in other parametric models (different to the PHR model) or other dependence models (copulas). The expressions for the dependence case for n > 4 and/or for non-exchangeable joint distributions (copulas) should also be obtained following a procedure similar to the one proposed here. In this case, the estimation of the parameters in the model (both in the distribution and in the copula) is a more complex (but important) task.