Maximum observed likelihood prediction of future record values

Point prediction of future upper record values is considered. For an underlying absolutely continuous distribution with strictly increasing cumulative distribution function, the general form of the predictor obtained by maximizing the observed predictive likelihood function is established. The results are illustrated for the exponential, extreme-value and power-function distributions, and the performance of the obtained predictors is compared to that of maximum likelihood predictors on the basis of the mean squared error and the Pitman’s measure of closeness criteria. For exponential and extreme-value distributions, it is shown that under slight restrictions, the maximum observed likelihood predictor outperforms the maximum likelihood predictor in terms of both performance criteria.

Record values, first studied by Chandler (1952), provide a natural model for the sequence of successive extremes in an i.i.d. sequence of random variables. In mathematical reliability theory, record values appear in the context of minimal repair systems (see Gupta and Kirmani 1988). There is also a close connection between the occurrence times of a nonhomogeneous Poisson process (NHPP) and record values. Indeed, by the results in Gupta and Kirmani (1988), under very mild conditions, the epoch times of a NHPP and record values are equal in distribution.
The problem of predicting a future record value R s based on the observed record values R 1 , . . . , R r , r < s has been studied by several authors. As far as non-Bayesian prediction is concerned, most of the proposed predictors have been derived by applying well-known prediction procedures that have previously been applied in the context of other models of ordered data. Specifically for one-sample prediction of record values, we refer to Raqab (2007), where the best linear unbiased predictor, the best linear equivariant predictor, the maximum likelihood predictor (MLP) as well as the conditional median predictor of the sth record value R s based on a Type II left-censored sample from the two-parameter exponential distribution are derived. His results supplement and generalize the results of Ahsanullah (1980), Basak and Balakrishnan (2003) and Nagaraja (1986, Section 4). A comparative study of several predictors of the sth record value R s based on the first r observed record values from the oneparameter exponential distribution can be found in Awad and Raqab (2000). Maximum likelihood prediction of future Pareto record values is studied in Raqab et al. (2007). Moreover, since the record value model is contained in the generalized order statistics model (see Kamps 1995), all results pertaining to prediction of future generalized order statistics can be specialized to solve the prediction problem for record values (see, e.g., Burkschat 2009). Bayesian prediction methods for future record values were first discussed by Dunsmore (1983) and have subsequently been applied to various distribution families. For these results, we refer to, e.g., Madi and Raqab (2004), Ahmadi and Doostparast (2006) and Nadar and Kızılaslan (2015).
As specifically for likelihood-based prediction methods, the maximum likelihood prediction procedure (see Kaminsky and Rhodin 1985) has received a great deal of attention in the literature. Applying a likelihood-based prediction method in the context of an ordered data model has so far been synonymous with applying the maximum likelihood prediction procedure. In this paper, an alternative maximum likelihood predictor, the maximum observed likelihood predictor (MOLP), is proposed and subsequently applied to predict future record values. Contrary to the maximum likelihood prediction procedure, the new method allows to derive the general form of the predictor as a function of the estimator of the underlying distributional parameters. Moreover, the obtained predictors outperform the MLP, which is illustrated by means of comparing MOLPs and MLPs of future exponential and extreme-value record values in terms of mean squared error and Pitman's measure of closeness. For properties of the MOLP, when the underlying distribution is assumed to be a Pareto, Lomax or Weibull distribution, we refer to Volovskiy (2018).

Maximum observed likelihood prediction procedure
Let X, Y be absolutely continuous random variables with values in R p and R, respectively, and joint probability density function f X,Y θ known up to a parameter vector θ ∈ ⊆ R d . Random variable X models observed data, while Y stands for a yet-notobserved value to be predicted using a predictor π(X). In non-Bayesian prediction setups, a natural approach to finding a predictor for Y based on X has been to define a generalized (parametric) likelihood function that can be used to solve statistical problems involving both fixed unknown parameters and unobserved random variables. In Bayarri et al. (1987), the authors consider the functions in what follows to be called predictive likelihood function (PLF) and observed predictive likelihood function (OPLF), respectively, as possible extensions of the classical parametric likelihood function and implement the maximum likelihood principle to obtain an estimate for θ and a prediction value for Y . They compare the proposed likelihood functions by comparing the estimates and prediction values obtained from them. By way of a slightly contrived example (see Bayarri et al. 1987, Section 2), the authors demonstrate that the maximum likelihood method applied to either L rv or L obs does not yield reasonable results in general, which led them to conclude that no general definition of a likelihood function can be given, only to argue in Bayarri and DeGroot (1988) in favor of L obs .
There has also been an attempt to justify the use of either L rv or L obs for deriving predictive inferences by using arguments from the theoretical foundations of statistical inference. In parametric inference, Fisher's likelihood function is pivotal to the formulation of the likelihood principle, and it is Birnbaum's theorem (see Birnbaum 1962), which establishes the equivalence of the likelihood principle and the sufficiency and conditionality principles, that can be seen as providing the theoretical justification for the choice of Fisher's likelihood function as a basis for parametric statistical analysis. For an in-depth discussion of the likelihood principle and Birnbaum's theorem, we refer the reader to the monograph by Berger and Wolpert (1988). It has been recognized that Birnbaum's result can serve as a guidance in generalizing the parametric likelihood beyond the case of parametric inference by requiring that the likelihood function be specified in such a way that the equivalence of the likelihood principle and the suitably modified sufficiency and conditionality principles continues to hold. This program was realized by Bjørnstad (1996) and Nayak and Kundu (2002). However, while the analysis in Bjørnstad (1996) provides a justification for L rv as a general specification of the likelihood function, the discussion in Nayak and Kundu (2002) favors L obs . Evidently, the generalizations of the sufficiency and conditionality principles proposed by Bjørnstad (1996) and Nayak and Kundu (2002) do not accord.
Certainly, the example in Bayarri et al. (1987, Section 2) can be understood to advise caution against careless application of classical likelihood methods when the likelihood function is extended to include random variables. However, one may also take the view that likelihood functions L rv and L obs are tools, albeit not of universal applicability, that can serve to derive predictors. This intention was behind the introduction of the maximum likelihood prediction procedure by Kaminsky and Rhodin (1985), which we briefly recall in the following definition.
Since its introduction, the maximum likelihood prediction procedure has become a standard method in models of ordered data. It was applied to the prediction of record values (see, e.g., Basak and Balakrishnan 2003), prediction of failure times of censored units in a progressive censoring procedure (see, e.g., Balakrishnan and Cramer 2014, Chapter 16) and generalized order statistics (see, e.g., Raqab 2001). Further references will be provided in Sect. 3. Apart from prediction based on ordered data, the method was applied to solve prediction problems in actuarial mathematics (see Kaminsky 1987). Contrary to prediction based on L rv , to the best of our knowledge, prediction based on maximization of L obs has received no attention apart from the articles focused on the foundations of statistics cited above. We propose to reconsider the approach by introducing the following L obs -based prediction procedure: L obs (y, θ |x).
The parameter θ may (partly) disappear from the function L obs (see an example in Bayarri et al. 1987, Section 2), in which case L obs does not provide guidance as to the (complete) choice of an estimator for θ , i.e.,θ MOL may, except for the restriction that it takes values in , be (in part) arbitrary. Thus, apart from the situation when the set of parameters present in L obs coincides with the subset of parameters that are of inferential interest, in general, the predictive maximum observed likelihood estimator of θ aims at ensuring the predictor of Y exists uniquely. A similar conclusion is also valid with respect to the predictive maximum likelihood estimator, which stems from the fact that, in general,θ ML is determined by Y and thus hardly can be considered a "sound" estimator of θ . The maximum observed likelihood prediction procedure will be applied to the prediction problem of future record values in the following section.

Prediction of future record values
Let (R n ) ∞ n=1 be the sequence of record values in a sequence of i.i.d. random variables with absolutely continuous cdf F θ and density function f θ , θ ∈ ⊆ R d , d ∈ N. In the present section, we aim to provide sufficient conditions for the existence of the MOLP of R s based on R = (R 1 , . . . , R r ), r , s ∈ N, r < s.
It turns out that due to the structure of the observed predictive likelihood function, the problem of finding the MOLP and the PMOLE can be reduced to that of finding the PMOLE. In order to derive the observed predictive likelihood function, we will need explicit expressions for the density functions of the distributions of R and R s as well as of the conditional distribution of R s given R r = x. These are summarized in the following lemma (see, e.g., Arnold et al. 1998). In what follows, we use the notational convention that for an interval I ⊆ R and n ∈ N, I n < = {(x 1 , . . . , x n ) ∈ I n | x 1 < · · · < x n }. Moreover, the left and right endpoints of the support of a distribution with cdf F are denoted, respectively, by α(F) and ω(F). Throughout, for cdf F with density function f , h denotes the hazard rate function defined by h( n=1 be the sequence of record values in a sequence of i.i.d. random variables with absolutely continuous cdf F and density function f . Then, for r , s ∈ N, r < s, the density functions of the distributions of R = (R 1 , . . . , R r ), R s as well as the conditional distribution of R s given R r = x, x ∈ (−∞, ω(F)), are given by Since, by assumption, for all θ ∈ , the underlying cdf F θ is continuous, the sequence (R n ) ∞ n=1 possesses the Markov property. Hence, by adopting the convention that 0/0 := 0, the observed predictive likelihood function of R s and θ given R = x , From this expression along with Lemma 3.1, we obtain that for a given x ∈ R r < , and x s ∈ R, θ ∈ , the observed predictive likelihood function satisfies where for x ≤ y < ω(F θ ), G θ,y (x) = ln(1 − F θ (x))/ ln(1 − F θ (y)).

Remark 3.1 (i) The OPLF can be rewritten as
This representation is related to the fact that the conditional distribution of R = (R 1 , . . . , R r ) given R s = y, y ∈ (α(F θ ), ω(F θ )), coincides with the distribution of the first r ordinary order statistics from a sample of s −1 i.i.d. random variables with cdf G θ ,y (see also Keseling 1999, Remark 1.17). From representation (2), we see that a sub-parameter θ i of the parameter vector θ = (θ 1 , . . . , θ d ) ∈ R d is not estimable by the method of observed predictive likelihood maximization if it does not appear in any of the functions ( As an example, consider the two-parameter exponential distribution. Then, θ = (μ, σ ) ∈ R × R + and, for y > μ, G θ,y (x) = x−μ y−μ , for x ≤ y, and G θ ,y (x) = 1 otherwise. Consequently, the method of observed predictive likelihood maximization cannot produce a meaningful estimator for the sub-parameter σ . However, this parameter dropout does not affect the usefulness of the method as a vehicle for deriving predictors for future record values as evidenced by Theorem 3.1.
(ii) For k ∈ N, the observed predictive likelihood function (1) coincides with the observed predictive likelihood function of R (k) n ) n∈N denotes the sequence of kth record values in a sequence of i.i.d. random variables with cdf F θ (see Dziubdziela and Kopociński 1976). This follows from the fact that kth record values in a sequence of i.i.d. random variables with cdf F are equal in distribution to record values in a sequence of i.i.d. random variables with cdf F 1:k = 1 − (1 − F) k (see Arnold et al. 1998, p. 43).
In the following, for x ∈ R r < , Ψ (·|x ) denotes the function given by In Theorem 3.1, the assumption r + 1 < s is made. This is due to the fact that, for s = r + 1, if a MOLP exists, it is necessarily given by π (s) MOLP = R r . Thus, in this case, the maximum observed likelihood prediction method does not produce a reasonable predictor. We refer to Remark 3.2 (iii) for more details.
Theorem 3.1 For s ≥ 3, let R 1 , . . . , R s be the first s record values in a sequence of i.i.d. random variables with cdf F θ , which, for all θ ∈ , is assumed to be absolutely continuous and strictly increasing on its support. Moreover, for r ∈ N, r < s − 1, let R = (R 1 , . . . , R r ) and letθ : where Ψ is given in (3). Then, a MOLP π (s) MOLP of R s and a PMOLEθ MOL of θ based on R are given by The aim is to maximize the observed predictive likelihood function of R s and θ given R = x , x ∈ R r < . Fix some x ∈ R r < . Then, by the assumed property (4) ofθ, we have that, for θ ∈ and x s ∈ (x r , ω(F θ )), Now, using the well-known expression for the mode of the probability density of a beta distribution with parameters s − r and r + 1, which by assumption are larger than 1, as well as the assumed strict monotonicity of F θ for all θ ∈ , we obtain that, for any θ ∈ , the function Since x was arbitrary, this shows that the predictor of R s and the estimator of θ as defined in (5) are indeed the MOLP of R s and the PMOLE of θ . Thus, the proof is complete.
Remark 3.2 (i) Since the quantity l θ (x s (θ , x )) in the proof of Theorem 3.1 does not depend on θ, the existence of a functionθ satisfying (4) is also necessary for the existence of a MOLP. (ii) On inspecting the proof of Theorem 3.1, we find that under the stated assumptions, if any two measurable functionsθ i : It is possible to obtain a predictor by the method of observed predictive likelihood maximization also in the case that s = r +1 by replacing the factor 1 ( However, this modification leads to the MOLP π (s) MOLP = R r , i.e., the last observed records value, which is certainly not a useful predictor. Aiming at comparing the MOLP with the MLP, we point out that, in all the examples we will consider in the following subsections, the predictor produced by the method of predictive likelihood maximization is also given by the last observed record value. This problem can be overcome by using another method of prediction or by applying appropriate prediction intervals (cf., e.g., Awad and Raqab 2000). (iv) Since function Ψ in (3) does not depend on s, the PMOLE of θ does not depend on s, either. Hence,θ MOL does not depend on which future record value one aims at predicting. We point out that the PMLE is not guaranteed to be free of the deficiency of depending on which future record is to be predicted. The occurrence of this shortcoming with estimators produced by the method of predictive likelihood maximization was observed by Bayarri et al. (1987, p. 8).
In the following subsections, we derive the MOLP and the MLP for future record values from different underlying distributions and compare their performance in terms of the mean squared error as well as Pitman's measure of closeness. As far as prediction in models of ordered data is concerned, Nagaraja (1986, Sections 3 and 4) was one of the first to use Pitman's measure of closeness as an alternative criterion to the mean squared error in a comparative study of predictors of future order statistics and record values.

Exponential distribution
Here, we assume that (R n ) n∈N is the sequence of record values in a sequence of i.i.d. two-parameter exponential random variables. The density, cumulative distribution and quantile functions of the exponential distribution E x p(μ, σ ) with location parameter μ ∈ R and scale parameter σ ∈ R + are given by we derive the MOLP and present the MLP of R s based on R = (R 1 , . . . , R r ).
The results concerning the form of the MOLP of R s are contained in the following proposition.
Proposition 3.1 For s ∈ N, let R 1 , . . . , R s be the first s record values in a sequence of i.i.d. two-parameter exponential random variables.
The PMOLE of μ has the formμ MOL = R 1 .
Proof Assume that μ is unknown. With the above choice of f θ and F θ , the function As the scale parameter σ is not present in Ψ , we only need to find a maximizing function with respect to μ. Letθ(x ) = (σ (x ), x 1 ), whereσ is an arbitrary measurable function on R r < with values in R + . Then, assuming that r ≥ 2,θ satisfies (4) with Ψ (·|x ) given by (6). Combining this with the fact that is the unique maximum observed likelihood predictor of R s based on R . Finally, we note that for known μ, the derivation of the predictor proceeds along the same lines. The details are omitted.
From the results of Basak and Balakrishnan (2003), it follows that the MLP of R s based on R has the slightly different form π (s) if μ is known, and π (s) if μ is unknown.

Comparison based on the MSE
The mean squared errors of the MOLPs are given in the following lemma, where MSE(π (s) (i) If μ is known, for r , s ∈ N, 1 ≤ r < s − 1, the MSE of the MOLP of R s based on R is given by (ii) If μ is unknown, for r , s ∈ N, 2 ≤ r < s − 1, the MSE of the MOLP of R s based on R is given by Proof We present the derivation of the MSE in the case of unknown μ only. The proof of the other case proceeds along the same lines. We have r .
By the results in Basak and Balakrishnan (2003), the MSE of the MLP is given by if μ is known, and by if μ is unknown. As for the relative performance of the predictors in terms of mean squared error, we have the following result. For an unknown location parameter, the MOLP turns out to have smaller MSE than the MLP, throughout.
Proof Again, we present only the proof of the case of unknown μ. Simple algebra yields  Hence, the MOLP has a smaller MSE than the MLP with no restrictions on r and s. Thus, the assertion is proved.
As it can directly be seen in the above proof, the difference in Theorem 3.2(ii) increases in s. Figure 1 contains the contour plots of the relative efficiency of π (s) MLP relative to π (s) MOLP based on R for all valid combinations of r and s in the range of 1-200. Table 1 contains relative efficiencies of the predictors for selected values of r and s. It can be seen from the contour plots as well as the table that the relative efficiency of the MLP relative to the MOLP is the highest for small values of r and decreases as r increases. The gains in efficiency are in the high single-digit and low double-digit percentage range. From the perspective of practical applications, where rather small numbers of observed record values are common, this fact makes the MOLP an attractive alternative to the MLP.

Comparison based on Pitman's measure of closeness
In preparation for the comparison of the predictors in terms of Pitman's measure of closeness, we first compute the Pitman efficiency The proof of the result is along the lines of that of the corresponding result in the comparison of the BLUP (best linear unbiased predictor) and the BLEP (best linear equivariant predictor) of order statistics from two-parameter exponential distributions presented in Nagaraja (1986, p. 14). In the following, F(m, n) denotes the F-distribution with parameters m, n ∈ N. The corresponding cdf will be denoted by F(·|m, n). We also set q = 2(r +1)(s−r ) (2r +1)(s−r −1) .
(ii) If μ is unknown, for r , s ∈ N, 2 ≤ r < s − 1, we have that The following result characterizes the performance of the MLP relative to the MOLP based on Pitman's measure of closeness:

Proposition 3.3 For s ∈ N, let R 1 , . . . , R s be the first s record values in a sequence of i.i.d. two-parameter exponential random variables. The MOLP of R s based on R
outperforms the MLP of R s based on R in terms of Pitman's measure of closeness irrespectively of whether μ is known or not, i.e., for any r , s, satisfying 1 ≤ r ≤ s − 2 (μ known) or 2 ≤ r ≤ s − 2 (μ unknown), we have that Proof First, assume that μ is unknown. Let X ∼ F(2(r − 1), 2(s − r )), r ≥ 2, r ≤ s − 2. Assume first that r > 2. Since for this choice of r and s the F -distribution satisfies the mode-median-mean inequality (see Groeneveld and Meeden 1977), it suffices to show that Since s − r ≥ 2, the expectation of X is finite, and we have that if and only if 0 < 3r + 1.
This yields the assertion for the case that r > 1. If r = 1, an analogous argument as in the first part of the proof yields the desired conclusion.

Extreme-value distribution
Next, we assume that (R n ) n∈N is the sequence of record values in a sequence of i.i.d. extreme-value random variables. The density, cumulative distribution and quantile functions of the extreme-value or reversed Gumbel distribution E V (μ, σ ) with location parameter μ ∈ R and scale parameter σ ∈ R + are given by (1 − x)), x ∈ (0, 1), where θ = (μ, σ ) ∈ R × R + . Next, for r , s ∈ N, r < s − 1, we derive the MOLP and the MLP of R s based on R = (R 1 , . . . , R r ). As for the form of the MOLP of R s based on R , we have the following result.
Proof With the above choice of f θ and F θ , the function Ψ (·|x ), x ∈ R r < , in (3) becomes Since the location parameter μ is not present in Ψ , we only need to find a maximizing function with respect to σ .
, wherê μ is an arbitrary measurable function on R r < with values in R. Then, assuming that r ≥ 2,θ satisfies (4) with Ψ (·|x ) given by (8). Combining this with the fact that F −1 yields that π (s) MOLP = R r +σ MOL ln ((s − 1)/r ) is the unique maximum observed likelihood predictor of R s based on R , where the PMOLE of σ takes the formσ MOL = 1

Remark 3.3
The PMOLE and the MLE of σ coincide. For the MLE of σ , we refer to Arnold et al. (1998, p. 127) (see also Remark 3.4).
Since extreme-value record values are generated from a location-scale family, it does not come as a surprise that linear prediction of extreme-value record values has already been treated (cf. Arnold et al. 1998, Example 5.6.3, p. 152, and section 5.6.2). However, it appears that maximum likelihood prediction of extreme-value record values has not been considered in the literature so far. The following result contains the expression of the MLP of R s based on R .
respectively. If s = r + 1, the MLP takes the form π (s) MOLP = R r .
Hence, the function μ → G(x s , μ, σ ), μ ∈ R, attains its global maximum value at a unique point and the global maximum value equals Observe that the function G(x s , x s − σ ln(s), σ ) ∝ e − s σ x s . Consequently, it suffices to show that

attains its global maximum value at a unique point, which is given by
Combing all these results completes the proof.

Remark 3.4
Observe that neither the PMLE of μ nor the PMLE of σ coincides with the MLE of the corresponding parameter. The MLEs of μ and σ are given byμ = R r +σ ln(r ) andσ = 1 Their derivation can be found, e.g., in Arnold et al. (1998, p. 127). Moreover, the PMLE of μ depends on which future record value is to be predicted, as the appearance of the index of the future record value in the expression forμ ML reveals.

Comparison based on the MSE
In preparation for the comparison of the MOLP to the MLP, we derive their MSEs. In what follows, for r , s ∈ N, r < s, the following notation will be used: Combining this with the expression for the expected value of an extreme-value record value (see Arnold et al. 1998, p. 32), the claimed expression for the bias of the MOLP readily follows. As for the MSE of the MOLP, observe first that by the fact that ∼ E x p(σ/(i − 1)), i ≥ 2, we have rσ MOL ∼ Gamma(r − 1, σ ), where Gamma(a, b) denotes the gamma distribution with shape parameter a ∈ R + and scale parameter b ∈ R + . Consequently, E((σ MOL ) 2 ) = σ 2 (r − 1)/r . Hence, using again the independence of spacings, we obtain Given thatσ ML = (r /(r + 1))σ MOL , the derivation of MSE(π (s) MLP ) proceeds along the same lines.
The function x → x −1 is decreasing on R + , and ln((s − 1)/r ) = s−1 r x −1 dx. Thus, we have that α r ,s (1) − ln((s − 1)/r ) > 0. Hence, since the factor in front of the curly brackets and the second summand between the curly brackets are obviously positive, the assertion follows.  Table 3 contains relative efficiencies of the predictors for selected r and s. The numerical results confirm the superiority of the MOLP over the MLP in terms of mean squared error. Though, the gains in efficiency are smaller than in the case of exponential record values. The highest efficiencies are achieved for small values of r , and the efficiency gains quickly become negligible as r increases. Still, from the perspective of practical applications, where rather small numbers of observed record values are common, this fact makes the MOLP an attractive alternative to the MLP. (

Comparison based on Pitman's measure of closeness
We compare the predictors based on Pitman's measure of closeness. Similarly to Lemma 3.3, we use the same arguments as in Nagaraja (1986, p. 14) to compute the Pitman efficiency. The corresponding result is contained in the following Lemma. Below, HE x p(α 1 , . . . , α n ) denotes the hypoexponential distribution with pairwise different rate parameters α 1 , . . . , α n ∈ R + (see Ross 2014).

Remark 3.5
Observe that U /T from Proposition 3.5 is distributed as the ratio of two hypoexponential random variables. Hence, using Kadri and Smaili (2014, Theorem 1), we obtain a representation of the cdf of U /T . For x ∈ R + , where I · (a, b) is the regularized incomplete beta function with parameters a, b > 0. A more compact representation of F U /T can be achieved if one observes that the sum in (10) is an alternating binomial sum. More specifically, we have that for x ∈ R + , (1, r − 1). Using the fact that iterated forward differences can be expressed by alternating binomial sums, we obtain a compact representation of F U /T as where the (s − r − 1)th fold difference is to be computed for i = 0. Thus, the Pitman efficiency P(|R s − π (s) MLP |) of the MLP relative to the MOLP can be expressed as with the (s − r − 1)th forward difference computed for i = 0. Since alternating sums can be numerically problematic, for an efficient and accurate procedure to compute P(|R s − π (s) , it is advisable to use high-precision arithmetic. See sumBinomMpfr() in R package Rmpfr and its documentation.  Figure 3b contains the contour plot of the Pitman efficiency of π (s) MLP relative to π (s) MOLP for all valid combinations of r and s in the range 2-100. Table 4 contains Pitman efficiencies of the predictors, which were computed using the expression derived in Remark 3.5, for selected r and s. Remarkably, the Pitman efficiencies are almost identical to those of the MLP relative to the MOLP for exponential record values presented in Table 2 (unknown μ).

Power-function distribution
In the previous two subsections, we have demonstrated by the examples of the exponential and extreme-value distributions the simplicity of deriving the MOLP as well as its superior performance over the MLP. With the power-function distribution, the situation is different. The MOLP can be shown to uniquely exist, though its computation is rather laborious. Moreover, it turns out that the MLP does not exist.
In what follows, we assume that (R n ) n∈N is the sequence of record values in a sequence of i.i.d. power-function random variables. The density, cumulative distribution and quantile functions of the power-function distribution Pow(θ) (or Beta(θ, 1)) with shape parameter θ ∈ R + are given by f θ (x) = θ x θ−1 , F θ (x) = x θ , x ∈ (0, 1) and F −1 θ (x) = x 1/θ , x ∈ [0, 1). Next, for r , s ∈ N, r < s − 1, we derive the MOLP of R s based on R = (R 1 , . . . , R r ). We also show that the MLP does not exist in the present situation.
Obviously, the difference of the first and the second power series has nonnegative coefficients. This implies the desired inequality. Consequently, h 1 (θ ) < 0, θ ∈ R + . This completes the proof of the assertion. Proof For x ∈ (0, 1) r < , the PLF satisfies x s ∈ (x r , 1), θ ∈ R + . For any fixed θ ∈ R + , lim x s →1 L rv (x s , θ|x ) = ∞. Hence, no global maximum exists.

Illustration
The prediction of future record values is illustrated for exponential distributions as in Sect. 3.1. In the literature, there are several real datasets of record values; an underlying exponential distribution is assumed by Dunsmore (1983) for data from a rock crushing machine (see also Awad and Raqab 2000) and by Razmkhah and Ahmadi (2013) for annual flood loss data. In the particular situation of an exponential distribution with unknown location parameter μ and scale parameter σ > 0, bias and mean squared prediction error of both, MLP and MOLP, can be stated explicitly. By noting that E(R r ) = μ + σ r , r ∈ N (see Arnold et al. 1998), we find for s > r + 1: such that both predictors are downward-biased. It is directly seen that the MOLP always has a smaller bias than the MLP. The mean squared prediction errors MSE(π (s) MOLP ) and MSE(π (s) MLP ) are stated in Sect. 3.1.1. Figure 4a, b shows the bias (in units of σ ) and the MSE (in units of σ 2 ) of both predictors for several values of the number r of observations and s = r + 2, r + 3, r + 4.

Conclusion
Based on the observed predictive likelihood function first studied as a tool for deriving predictive inferences by Bayarri et al. (1987), a novel likelihood-based prediction procedure is proposed. The prediction method is successfully applied to the problem of future (upper) record value prediction, and for underlying exponential and extreme-value distributions, it is demonstrated that the resulting predictors exhibit superior performance relative to predictors produced by the widely applied maximum likelihood prediction procedure. The obtained predictors will be useful in reliability applications involving modeling repairable systems and, more generally, in areas where the underlying stochastic dynamics are adequately described by nonhomogeneous Poisson processes.