Maximum product of spacings prediction of future record values

A spacings-based prediction method for future upper record values is proposed as an alternative to maximum likelihood prediction. For an underlying family of distributions with continuous cumulative distribution functions, the general form of the predictor as a function of the estimator of the distributional parameters is established. A connection between this method and the maximum observed likelihood prediction procedure is shown. The maximum product of spacings predictor turns out to be useful to predict the next record value in contrast to likelihood-based procedures, which provide trivial predictors in this particular case. Moreover, examples are given for the exponential and the Pareto distributions, and a real data set is analyzed.


Introduction
A general procedure for estimating parameters, termed maximum product of spacings estimation, was proposed independently by Cheng and Amin (1983) and Ranneby (1984) as an alternative to maximum likelihood estimation in particular situations of continuous, univariate distributions. In the sequel, the estimation method was applied, extended and further theoretically studied in, e.g., Ekström (1998Ekström ( , 2008, Shao and Hahn (1999) and Anatolyev and Kosenok (2005). Here, we adopt the method of Suppose that X 1 , X 2 , . . . is an infinite sequence of independent and identically distributed (i.i.d.) continuous random variables with cumulative distribution function (cdf) F. An observation X j is called an (upper) record value provided it is greater than all previously observed values. More specifically, defining the record times as L(1) = 1, L(n + 1) = min{ j > L(n) | X j > X L(n) }, n ∈ N, the sequence (R n ) n∈N = (X L(n) ) n∈N is referred to as the sequence of (upper) record values based on (X n ) n∈N [see Arnold et al. (1998); Nevzorov (2001)]. The study of record values dates back to Chandler (1952) providing a natural model for the sequence of successive extremes in an i.i.d. sequence of random variables. The structure of record values also appears in the context of minimal repair of a system, and there is a close connection to occurrence times of a non-homogeneous Poisson process (NHPP); namely, under mild conditions, the epoch times of an NHPP and upper record values are equal in distribution [see Gupta and Kirmani (1988)].
We are concerned with the problem of predicting the occurrence of a future record value R s based on the first r , r < s, (observed) record values R = (R 1 , . . . , R r ). This prediction problem has been studied by several authors. Here, we focus on non-Bayesian prediction. In the one-sample case, Raqab (2007) derived the best linear unbiased predictor, the best linear equivariant predictor, the maximum likelihood predictor as well as the conditional median predictor of the s-th record value R s from a Type-II left censored sample with a two-parameter exponential distribution. His findings supplement and generalize results of Ahsanullah (1980), Basak and Balakrishnan (2003) and Nagaraja (1986, Sect. 4). Awad and Raqab (2000) provide a comparative study of several predictors of the s-th record value R s based on the first r observed record values from a one-parameter exponential distribution. Linear unbiased prediction of future Pareto record values is discussed in Paul and Thomas (2016), and maximum likelihood prediction of future Pareto record values is studied in Raqab (2007). Since the model of record values is contained in the generalized order statistics model [see Kamps (1995Kamps ( , 2016], all results pertaining to prediction of future generalized order statistics can be specialized to solve the prediction problem for record values [see, e.g., Burkschat (2009)]. Bayesian prediction methods for future record values were first discussed by Dunsmore (1983) and have subsequently been applied to various distribution families [cf. Madi and Raqab (2004); Ahmadi and Doostparast (2006); Nadar and Kızılaslan (2015)]. It should be noted that, under exponential as well as under Pareto distributions, maximum likelihood prediction of the subsequent record value R r +1 becomes trivial, since the respective predictor is given by R r , i.e., the predictor coincides with the last observed record value in the model. However, by construction, record values are strictly ordered; thus, the maximum likelihood predictor of the (r + 1)th record value based on the first r record values yields a useless prediction in practical situations, e.g., when aiming at predicting the next record claim in an insurance company. In the following, a prediction principle, referred to as the maximum product of spacings prediction, will be introduced and studied to overcome this shortcoming [see also Volovskiy (2018) for further details]. A similar approach has been mentioned by Raqab et al. (2019) for records from a Weibull distribution, recently. Volovskiy and Kamps (2020) [see also Volovskiy (2018)] have introduced a new general likelihood-based prediction procedure, the so-called maximum observed likelihood prediction method, and applied it to predict future record values; although such a predictor may outperform the respective maximum likelihood predictor in terms of both criteria, mean squared error and Pitman closeness, it may also share the same drawback when predicting the very next record value.
For the proposed prediction procedure by means of maximizing the geometric mean of spacings of suitably transformed and normalized record values data, a general representation of the predictor as a function of an estimator of the underlying distributional parameters is established. Furthermore, its relation to the maximum observed likelihood predictor is demonstrated via a heuristic approximation argument. It is pointed out that the spacings-based method retains the desirable properties of the likelihoodbased procedure while at the same time avoids its deficiency of not being able to produce a useful prediction for the next record value. The prediction procedure is illustrated by deriving predictors of future exponential and Pareto record values. A real data example is shown under the assumption of an underlying Pareto distribution.

Maximum product of spacings prediction procedure
The prediction procedure we are about to present derives its motivation from the maximum product of spacings estimation method introduced independently by Cheng and Amin (1983) and Ranneby (1984) as an alternative to maximum likelihood estimation. The heuristics underlying the maximum product of spacings estimation method are as follows. Let F = {F θ | θ ∈ Θ}, Θ ⊆ R d , be a parameterized family of continuous cumulative distribution functions on R with Lebesgue density functions { f θ | θ ∈ Θ}. Furthermore, let X 1 , . . . , X n be i.i.d. random variables with cdf F θ 0 ∈ F, where the parameter vector θ 0 ∈ Θ is unknown. Now, observe that the spacings with F θ 0 (X 0:n ) := 0, are distributed as spacings of an ordered sample U 1:n , . . . , U n:n of size n from the standard uniform distribution [see David and Nagaraja (2003)]. Since, in expectation, the sample U 1:n , . . . , U n:n induces an equidistant partition of the unit interval, obtaining an estimate for θ 0 by tuning the parameter vector such that the spacings (1) become as equal as possible seems a plausible way to go. The maximum product of spacings estimation procedure achieves this by maximizing the geometric mean of the spacings, i.e. the function with respect to θ ∈ Θ, where F θ 0 (X n+1:n ) := 1. For further details on this estimation method, we refer the reader to the respective articles referred to in the introduction.
In order to apply the above reasoning to the problem of predicting future record values, several adjustments to the procedure in the estimation set-up will be necessary, which primarily are due to the non-i.i.d. structure of the data at hand as well as the structure of the inferential task. In what follows, E x p(1) denotes the standard exponential distribution. Let (R n ) ∞ n=1 be the sequence of record values in a sequence of i.i.d. random variables with continuous cdf F θ 0 ∈ F. In what follows, we are primarily concerned with the problem of predicting R s based on where (R n ) ∞ n=1 is the sequence of record values in an i.i.d. sequence of standard exponential random variables, and where, for n ∈ N, Arnold et al. (1998, p. 9)]. Apart from this fact, the following result, a proof of which can be found in Nevzorov (2001, pp. 12-13) will prove crucial for the following discussion.
Combining the distributional identity (2) and Lemma 1, we conclude that where H θ 0 (R 0 ) = U 0:(s−1) := 0 and U s:(s−1) := 1. In light of the discussion of the maximum product of spacings estimation method, the distributional identity (3) motivates the following definition. For a distribution with cdf F, let α(F) and ω(F) denote, respectively, the left and right endpoints of the support of the distribution. In what follows, for n ∈ N, we define with x 0 = −∞, and where we use the notational convention that, for an interval I ⊆ R and n ∈ N, I n < = {(x 1 , . . . , x n ) ∈ I n | x 1 < x 2 < · · · < x n }. The set Z n is a collection of all admissible combinations of the parameter vector θ and the record values sample x 1 , . . . , x n of size n. In what follows, for a subset B ⊂ R n , B n |B will denote the restriction of the Borel σ -algebra B n on B. and then ν s−r (R) is called a maximum product of spacings predictor (MPSP) of R s based on R. Any such predictor will be denoted by π (s) M P S P .
Next, we establish the general form of the MPSP as a function of the underlying estimator of the parameter vector. It turns out that the estimator is obtained by maximizing the function θ → P r (θ, x ). In what follows, the quantile function of a cdf F will be denoted by F −1 .
exists with the property that, for any fixed θ ∈ Θ, we have with P r as defined in (4), then a maximum product of spacings predictor of R s based on R is given by Proof We have that, for any fixed x = (x 1 , . . . , x r ) ∈ R r < , and a suitable constant c depending only on r and s, and where in the second line we used that, for fixed θ , x r and x s , with ν 0 = −∞. Now, using the well-known expression for the mode of the probability density function of a beta distribution with parameters s − r + 1 and r + 1, as well as the continuity of F θ for all θ ∈ Θ, we obtain that, for θ ∈ Θ, the function possesses at least one maximizing point, and any of these can be obtained as a solution of the equation with respect to x s ∈ (x r , ω(F θ )). A particular solution, say x s (θ, x ), of this equation is given by . Moreover, we have that l θ (x s (θ, x )) is independent of θ . Thus, combining Eqs. (9) and (10) as well as using property (6) ofθ and the equality (8), we conclude thatθ and the function ν defined by satisfy (5). Hence, by Definition 1, the (s − r )th coordinate function of ν composed with R yields the MPSP.
Remark 1 (i) The maximum product of spacings prediction procedure produces predictions consistently in the following sense: In determining a prediction value ν s−r (x ) for R s based on a sample x of R 1 , . . . , R r , by the definition of the prediction procedure, one is also required to produce values ν 1 (x ), . . . , ν s−r −1 (x ) such that (5) is satisfied for ν(x ) = (ν 1 (x ), . . . , ν s−r (x )). It is then tempting to take ν 1 (x ), . . . , ν s−r −1 (x ) as prediction values for R r +1 , . . . , R s−1 and ask how these prediction values relate to those one would obtain by computing prediction values according to Definition 1. Since the values ν 1 (x ), . . . , ν s−r (x ) are available in closed form via formula (11), it is obvious that, fors such that r <s < s, π (s) M P S P (x ) = νs −r (x ), i.e. taking νs −r (x ) as a prediction value for Rs amounts to predicting Rs via Definition 1. (ii) When predicting the very next record (s = r + 1), the MPSP does not become trivial in general, i.e. π (r +1) M P S P will exceed R r . (iii) Since k-th record values [see Dziubdziela and Kopociński (1976)

Relation to maximum observed likelihood prediction
Recently, Volovskiy and Kamps (2020) introduced the so-called maximum observed likelihood prediction procedure (MOLP) and used it to derive predictors for future record values. More specifically, the MOLP derives a predictor of a random variable Y based on a possibly vector-valued random variable X with joint pdf f X ,Y θ by maximizing the observed predictive likelihood function L obs defined by with respect to θ and y. In the case of predicting R s based on R = (R 1 , . . . , R r ), the maximum observed likelihood predictor takes on the form [see Volovskiy and Kamps (2020, Theorem 3.3), Volovskiy (2018, Theorem 5.3)] which is quite similar to the form of π (s) M P S P in (7), although the procedures to derive these predictors seem to be totally different. Here, the case s = r + 1 does not lead to a useful predictor, in general. In particular situations, the MOLP was shown to outperform a respective maximum likelihood predictor in terms of mean squared error and Pitman closeness. In (12), the functionθ is such that where the function Ψ is given by Assuming that the cdfs F θ , θ ∈ Θ, have a common finite left endpoint of the support, say x 0 = α(F θ ), and using the approximation Thus, under the assumption of the finiteness of a common left endpoint of the support of the underlying family of distributions, the objective functions used to estimate the distributional parameters in the maximum observed likelihood and the maximum product of spacings prediction method, respectively, are approximately proportional to each other. This as well as the fact that, for large s, s−1 r ≈ s r , implies that Note that the above rather heuristic analysis does not imply any statement about the quality of this approximation. A comparison of the functional forms of the predictors reveals that while the MOLP yields the last observed value as prediction value for the next observation and, hence, cannot be considered a sensible prediction method in this particular setting, the maximum product of spacings method produces a prediction value different from the last observation. At the same time, both prediction procedures share the desirable properties of allowing to derive the general form of the predictor [see Theorem 1 and (12)] as well as the simplicity of deriving the predictors for specific distribution families, as is illustrated by the examples in the following section.

Examples
The MPSP-approach is illustrated for exponential and Pareto distributions.

Exponential distribution
Assume that (R n ) n∈N is the sequence of record values in a sequence of i.i.d. twoparameter exponential random variables. The density, cumulative distribution and quantile functions of the exponential distribution E x p(μ, σ ) with location parameter μ ∈ R and scale parameter σ ∈ R + are given by where θ = (μ, σ ) ∈ R × R + . As far as likelihood-based prediction of future record values is concerned, the MLP of R s based on R = (R 1 , . . . , R r ), r < s, was derived by Gupta and Kirmani (1989) [see also Basak and Balakrishnan (2003)] and takes on the form π (s) while the MOLP of R s based on R was computed by Volovskiy and Kamps (2020) [see also Volovskiy (2018)] and equals π (s) Note that both the MLP and the MOLP yield the prediction R r for R s if s = r +1, and, hence, cannot be considered reasonable prediction methods in this particular situation. In view of Theorem 1, in order to determine an MPSP of R s based on R, it suffices, for any x ∈ R r < , to solve the maximization problem max θ∈Θ: the maximization has to effectively be performed with respect to the location parameter μ only. Because the function f (x) = x/(x +c) r , x ∈ [0, ∞), where c is some positive constant, possesses a unique maximum point, which is given by x = c/(r −1), settinĝ andθ = (μ,σ ), whereσ : R r < → R + is some arbitrary function, we conclude thatθ satisfies (6). Consequently, the unique MPSP of R s based on R is given by Thus, it turns out that in this particular setting the MPSP coincides with the BLUP [see Ahsanullah (1980)]. If the location parameter is known, function P r is independent of the distributional parameters, which considerably simplifies the derivation of the MPSP. In this set-up, the MPSP takes on the form π (s) and, by the results of Basak and Balakrishnan (2003), again is seen to coincide with the BLUP.

Pareto distribution
We assume that (R n ) n∈N is the sequence of record values in a sequence of i.i.d. Pareto random variables. The density, cumulative distribution and quantile functions of the Pareto distribution Pareto(α, β) with scale parameter α ∈ R + and shape parameter β ∈ R + are given by where θ = (α, β) ∈ R 2 + . Maximum likelihood and maximum observed likelihood prediction of R s based on R = (R 1 , . . . , R r ), 2 ≤ r < s, were discussed in Volovskiy (2018), where it was shown that the respective predictor takes on the form Again, from the expressions of the MLP and the MOLP, it is evident that both likelihood-based prediction methods produce R r as predictor for R s if s = r + 1. Next, we determine the MPSP of R s based on R. Function P r takes on the form For a positive constant c, the function f (x) = − ln(x)/(− ln(cx)) r , x ∈ (0, 1), possesses the unique maximum point x = c 1/(r −1) . Hence, settinĝ and choosing an arbitrary functionβ : (0, ∞) r < → R + , we obtain thatθ = (α,β) satisfies (6). Thus, by Theorem 1, the unique MPSP of R s based on R is given by In view of the fact that the Pareto distribution often allows for adequate modeling of quantities spanning many orders of magnitude, it seems natural to evaluate the M P S P )) = 0.

Real data example
In this section, we illustrate the practical applicability of the proposed prediction procedure on a dataset of water level measurements. Extreme water levels may have a major environmental impact and, due to potential flood situations, pose a serious threat to the human population. For our analysis, we consider data collected by the German Federal Office of Hydrology (FOH) in its role as a scientific advisor to the Federal Waterways and Shipping Administration, publicly available at https://www.  pegelonline.wsv.de/gast/start. For measurements data older than 30 days, one has to contact the FOH directly (www.bafg.de). The data set contains hourly measurements (in cm) of water level for the time period from January 1918 to February 2019 collected at the measurement site Cuxhaven-Steubenhöft located at the river Elbe. In order to approximately meet the i.i.d.-assumption in our record model, we calculated the weekly maximum water levels based on the hourly data, which then served as the basis for prediction. In addition, we retained only those measurements exceeding 690 cm. We show in Fig. 1 the histogram of the full dataset of weekly maximum water levels as well as the histogram of the weekly maximum water levels above 690 cm. To assess the distributional properties of the dataset of water levels above 690 cm, we use a Pareto Q-Q plot (see Fig. 2). Apart from the last few points, the Pareto Q-Q plot is more or less linear indicating a reasonable fit, at least for the purpose of this illustrative example, of the Pareto distribution to the tail of the weekly maximum water levels. The maximum likelihood estimate of the shape parameter isβ = 16.9. In Fig. 1b, the Pareto density function with parameterβ is plotted. The sequence of record values extracted from the dataset of weekly maximum water levels exceeding 690 cm is given by 713, 781, 880, 885, 901, 914, 915, 993, 1010. We applied the maximum product of spacings prediction procedure for Pareto record values (see Sect. 4.2) to predict the subsequent record water level R r +1 based on the preceding r observed record water levels by successively increasing the sample size r from 2 up to 9. The results are reported in Table 1. From the results we observe that the MPSP is able to capture the magnitude of the observed record water levels, and this even more so, the larger the sample size. appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.