Improving the bias of a pseudo-maximum likelihood estimate of the extreme value index by k-records

Louzaoui, Abderrahim; El Arrouchi, Mohamed

doi:10.1007/s44199-023-00055-7

Improving the bias of a pseudo-maximum likelihood estimate of the extreme value index by k-records

Research Article
Open access
Published: 06 April 2023

Volume 22, pages 54–69, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Statistical Theory and Applications Aims and scope Submit manuscript

Improving the bias of a pseudo-maximum likelihood estimate of the extreme value index by k-records

Download PDF

921 Accesses
Explore all metrics

Abstract

The paper focusses on the estimation of the extreme value index in terms of k-records based on a maximum likelihood approach, which is suggested recently by Louzaoui and El Arrouchi (J Probab Stat, 2020). Its asymptotic normality is well investigated in order to propose a bias correction while ensuring that the new estimator becomes asymptotically unbiased and still normal. Some numerical studies are also provided in order to show how the proposed estimators behave in practice.

Improving Asymptotically Unbiased Extreme Value Index Estimation

Revisiting the Maximum Likelihood Estimation of a Positive Extreme Value Index

Article 01 March 2015

Computational Study of the Adaptive Estimation of the Extreme Value Index with Probability Weighted Moments

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

For an n-sample $X_1,X_2,\ldots ,X_n$ from a continuous distribution function F, let $X_{1,n} \le \cdots \le X_{n,n}$ be the corresponding order statistics. Recall that k-record process is defined in terms of the kth largest observations, see Dziubdziela and Kopocinski [10]. For any integer k, let

$$\begin{aligned} \nu ^{(k)}_{1}=k,\ \ \nu ^{(k)}_{i+1}=\min \left\{ j>\nu ^{(k)}_{i}: X_{j-k+1,j}>X_{\nu ^{(k)}_{i}-k+1,\nu ^{(k)}_{i}}\right\} , \end{aligned}$$

and then, the k-record values are defined by $R^{(k)}_{i}=X_{\nu ^{(k)}_{i}-k+1,\nu ^{(k)}_{i}},\ \ i\ge 1$. Next, suppose that F belongs to the max-domain of attraction of an extreme value distribution $G_{\gamma }$ ($F \in D(G_{\gamma })$) where $\gamma \in {\mathbb {R}}$ is the extreme value index. That is, there exist sequences $a_n > 0$ and $b_n \in {\mathbb {R}}$ such that

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {P}}\left( a_n^{-1}(X_{n,n}-b_{n})\le x\right) =G_{\gamma }(x)=\exp \left[ -(1 +\gamma x)^{-1/\gamma }\right] , \end{aligned}$$

(1)

where $1 + \gamma x > 0$. Let $U(y) =\inf \{z: 1-F(z) \le 1/y\}$, for $y \ge 1$. The first order condition (1), in term of U, is equivalent, for all $x>0$, to

$$\begin{aligned} { \lim \limits _{t\rightarrow \infty }\frac{U(tx)-U(t)}{a(t)}=\frac{x^\gamma -1}{\gamma },} \end{aligned}$$

(2)

where a(.) is a some positive auxiliary function. It can be proved that (1) or (2) is equivalent to

$$\begin{aligned} \lim _{t\rightarrow t_*}{\mathbb {P}}(X_1\le \sigma (t)x +t|X_1>t)=D_{\gamma }(x)=1-(1 +\gamma x)^{-1/\gamma }, \end{aligned}$$

(3)

where $1 + \gamma x > 0$ and $\sigma (.)>0$ is a function with $t_{*} = \sup \{y: F(y)< 1\}\le \infty$ is the right endpoint of F. The function $D_{\gamma }$ is known as the Generalized Pareto Distribution. See [6] for more theoretical discussion on the max-domain of attraction.

The problem of estimating the extreme value index $\gamma$ on the basis of the largest observations of the sample $(X_1,\ldots , X_n)$ has received very special attention in the classical extreme value theory. Many statistics based on higher order statistics have been proposed to estimate $\gamma$ such as Hill’s estimator [14], Pickands’s estimator [16], Moment estimator derived by Dekkers et al. [8], Maximum likelihood (ML) estimator suggested by Drees et al. [9]. For further informations, [6] and [3] gives a good introduction which are rich in application, but gives even more theoretical and practical details on the estimation problem of the extreme value index. In the other hand, the example of the Resnick’s duality theorem [2, Theorem 2.3.3] or the caracterization of tail distributions [11] show that the extreme value theory is very linked to the theory of record values. A recent development in record theory can be found in [13] and [1]. Observing k-records only prevents the possibility of applying conventional estimators in extreme value statistics and therefore, the construction of estimators based on record values is essential [12]. This problem has not been sufficiently studied in the literature which has been revisited recently by Louzaoui and El Arrouchi [15] by using a maximum likelihood (ML) approach based on the top $k+1$ highest k-records. More precisely, for $k=k_m$ an intermediate sequence of integers satisfying $k_m \rightarrow \infty$, $k_m/m\rightarrow 0$ as $m\rightarrow \infty$ with m is the number of k-records observed, they are showed that the conditional joint distribution of $\left( R_{m-k+1}^{(k)}-R_{m-k}^{(k)},\ldots ,R_{m}^{(k)}-R_{m-k}^{(k)}\right)$ given $R_{m-k}^{(k)}= y$ is the same as the unconditional joint distribution of the k-record values $\left( Z^{(k)}_{1}, \ldots ,Z^{(k)}_{k}\right)$ from independent and identically distributed random variables $Z_1,Z_2\ldots$ having a distribution $F_y(z)= (F(z)-F(y))/(1-F(y))\ \ (z>y)$ (left-truncated distribution) which can be replaced, from (3), by $D_{\gamma }(./\sigma )$ (Generalized Pareto Distribution). This result can be used to construct a pseudo maximum likelihood estimation $({{\hat{\gamma }}},{{\hat{\sigma }}})$ of the unknown parameters $(\gamma ,\sigma )$; that is, based on the sample of k-record values $\left( Z^{(k)}_{1}, \ldots ,Z^{(k)}_{k}\right)$, we can maximize the likelihood function

$$\begin{aligned} L = k^{k} \left( 1-D_{\gamma }(Y_{1}/\sigma )\right) ^{k}\prod _{i=1}^{k} \frac{d_{\gamma ,\sigma } (Y_{i})}{1-D_{\gamma }(Y_{i}/\sigma )}, \end{aligned}$$

with $Y_{i} = R^{(k)}_{n-i+1}- R^{(k)}_{n-k}$, $1 \le i\le k$ and $d_{\gamma ,\sigma }(y)= \partial D_{\gamma }(y/\sigma )/\partial y$. Consequently, ${\hat{\gamma }}:\equiv {\hat{\gamma }}_{m}(k)$ and ${\hat{\sigma }}:\equiv {\hat{\sigma }}_{m}(k)$ are obtained by solving the likelihood equations

$$\begin{aligned} \left\{ \begin{aligned}&\log \left( 1+\frac{\gamma }{\sigma } Y_1\right) =\gamma , \\&\left( 1+\frac{1}{\frac{\gamma }{\sigma }Y_1}\right) \cdot \frac{1}{k} \sum \limits _{i=1}^{k}\frac{1}{1+\frac{\gamma }{\sigma }Y_i}=1/\gamma . \end{aligned} \right. \end{aligned}$$

(4)

with $Y_{i} = R^{(k)}_{m-i+1}- R^{(k)}_{m-k}$. For $\gamma = 0$, the equations are obtained by continuity. Put $h_m(t):=q_m(t). g_m(t)-1$ where $g_m(t):= \frac{1}{k}\sum \nolimits _{i=1}^{k}{\frac{1}{1+tY_i}}$, $q_m(t):= \left( 1+\frac{1}{tY_1}\right) f_m(t)$ and $f_m(t):= \log \left( 1+tY_1\right)$. Any solution $({\hat{\gamma }},{\hat{\sigma }})$ of (4) satisfies $h_m({\hat{\gamma }}/{\hat{\sigma }})=0$. Conversely, $({\hat{\gamma }},{\hat{\sigma }})=(f_m(t^*),f_m(t^*)/t^*)$ is solution of (4) for any non-zero solution $t^*$ of $h_m(t)=0$. It can be easily seen that $h_m(t)=0$ has a zero solution which must be dropped even if really $\gamma =0$.

Under the first order condition (2), Louzaoui and El Arrouchi [15] have shown the existence of a random N such that the likelihood equations have a consistent solution $({\hat{\gamma }}_{m},{\hat{\sigma }}_{m})$ for all $m\ge N$. Here, we study their asymptotic normality under the so-called second order condition and derive another estimation of the extreme value index which is asymptotically unbiased and normal. The remainder of this paper is organized as follows. In Sect. 2, we establish the asymptotic normality of the ML estimators for $\gamma \ne 0$ and then we propose a bias correction. Section 3 will devoted to some numerical studies which lend further support to our theoretical results with discussion. Finally, in Sect. 4, a real data set is analyzed by using the suggested methods.

2 Main results

The study of the asymptotic normality of ML estimators requires a second order condition which is a refinement of (2), see [6]. For some $\gamma$ positive, there exists a auxiliary function A(t) (with constant sign and $A(t) \rightarrow 0$ as $t\rightarrow \infty$) and a real index $\rho \le 0$, such that, $\forall x>0$,

$$\begin{aligned} { \lim \limits _{t\rightarrow \infty }\frac{\log U(tx)-\log U(t)-\gamma \log x}{A(t)}= \frac{x^{\rho }-1}{\rho }.} \end{aligned}$$

(5)

The parameter $\rho$ governs the rate of convergence in (2). It can be shown that necessarily $|A| \in RV_{\rho }$. The parameter $\rho$ is of primordial importance in the adaptive choice of the threshold to be considered in the estimation of the extreme value index [6, 12]. For $\gamma <0$ this condition becomes, $\forall x>0$

$$\begin{aligned} {\lim \limits _{t\rightarrow \infty }\frac{\log (U(\infty )- U(tx))-\log (U(\infty )- U(t))-\gamma \log x}{A(t)}= \frac{x^{\rho }-1}{\rho },} \end{aligned}$$

(6)

We now state our main result, stating asymptotic normality of ML estimators.

Theorem 2.1

Suppose that the second order condition (5) or (6) holds and suppose $k=k_m\rightarrow \infty$, $k/m\rightarrow 0$, $\left( m\log \log m\right) ^{1/2}/k\rightarrow 0$, as $m\rightarrow \infty$.

(1) If $\lim \nolimits _{m\rightarrow \infty }\sqrt{k}A(e^{{m/k}})=\lambda \in {\mathbb {R}}$, then as $m\rightarrow \infty$
- (i) $\sqrt{k}({\hat{\gamma }}_m(k)-\gamma ) {\mathop {\longrightarrow }\limits ^{d}} {\mathcal {N}}\left( \lambda \mu (\rho ),\gamma ^{2}\right)$, with $\mu (\rho )=\frac{1-e^{-\rho }}{\rho }$.
- (ii) When ($-\rho <\gamma$) or ($-\rho>\gamma >0,\ \lim _{t\rightarrow \infty }a(t)-\gamma U(t)=0$) or ($\gamma <0$), we have as $m\rightarrow \infty$
  $$\begin{aligned} \sqrt{k}\left( {\hat{\gamma }}_m(k)-\gamma ,\sqrt{\frac{k}{m-k}}\left( \frac{{\hat{\sigma _m}}(k)}{a\left( e^{(m/k)-1}\right) }-1\right) \right) {\mathop {\rightarrow }\limits ^{d}} {\mathcal {N}}\left( \lambda b_{\gamma ,\rho },\Sigma \right) , \end{aligned}$$
  with ${{\mathcal {N}}}$ normal distribution, $b_{\gamma ,\rho }=(\mu (\rho ),0)$ and the covariance matrix $\Sigma$ is given by $\left( \begin{array}{cc} \gamma ^2&{} 0 \\ 0 &{} \gamma ^2 \\ \end{array} \right)$.
(2) If $\lim \nolimits _{n\rightarrow \infty }\sqrt{k}|A(e^{{m/k}})|=+\infty$, then as $m\rightarrow \infty$
$$\begin{aligned} \left( A(e^{{m/k}})\right) ^{-1}({\hat{\gamma }}_m(k)-\gamma ){\mathop {\longrightarrow }\limits ^{p}}\mu (\rho ). \end{aligned}$$

Proof

Let $\{E_i,i\ge 1\}$ be an independent and identically distributed sequence of standard exponential random variable and $S_j = E_1 + \cdots + E_j,\ j \ge 1$. Denote the hazard function of F by $H(x)=-\log (1-F(x))$. It can be seen easily that, for $x\ge 1$, $U(x)=H^{\leftarrow }(\log (x))$ and $H^{\leftarrow }$ is a strictly increasing function, since F is continuous. From this and the Relation (4.7) in [17], we get the following representation

$$\begin{aligned} \left\{ R_j^{(k)},j\ge 1\right\} {\mathop {=}\limits ^{d}}\left\{ H^{\leftarrow }\left( S_j/k\right) , \ j\ge 1\right\} =\left\{ U\left( e^{S_j/k}\right) , \ j\ge 1\right\} . \end{aligned}$$

Without loss of generality, we can assume that $R_{m-j}^{(k)}=U\left( e^{\frac{S_{m-j}}{k}}\right)$, where $0\le j\le k$ and $m\ge 1$.

For the case $\gamma >0$, Louzaoui and El Arrouchi [15] have given bounds for the solution $t^*$ of $h_m(t)=0$. More Precisely, under the first order condition (2) and when $\delta _m\rightarrow 0,\ k\rightarrow \infty ,\ k/m\rightarrow 0$ and $k/\log m\rightarrow \infty$ as $m\rightarrow \infty$, they proved the existence of N, a random integer, such that, $h_m(T_m^{(\delta _m)})<0$ and $h_m(T_m^{(-\delta _m)})>0$ for any $m\ge N$ almost surely, where $T_m^{(\delta _m)}:=(1+\delta _m)/R^{(k)}_{m-k}$ (see [15, Lemma 3]). From this, the existence of a random variable $T_m^*\in [T_m^{(-\delta _m)},T_m^{(\delta _m)}]$ such that, almost surely, $h_m(T_m^*)=0$ is assured by the mean value theorem. Notice that condition $\left( m\log \log m\right) ^{1/2}/k\rightarrow 0$ implies $k/\log m\rightarrow \infty$.

Let $W_{m}=\frac{R_{m}^{(k)}}{R_{m-k}^{(k)}}$. We have, $m\rightarrow \infty$,

$$\begin{aligned} f_m(T_m^{(\delta _m)})-\gamma =&\log (W_{m}+\delta _{m}(W_{m}-1))-\gamma \\ =&\log W_{m} + \delta _{m}\frac{W_{m}-1}{W_{m}}- \gamma +o_{p}(\delta _{m})\\ =&\log U\left( e^{\frac{S_{m}-S_{m-k}}{k}}e^{\frac{S_{m-k}}{k}}\right) -\log U\left( e^{\frac{S_{m-k}}{k}}\right) -\gamma \frac{S_{m}-S_{m-k}}{k}\\&+\gamma \left( \frac{S_{m}-S_{m-k}}{k}-1\right) +\delta _{m}\frac{W_{m}-1}{W_{m}}+o_{p}(\delta _{m}). \end{aligned}$$

From (5) and by Theorem B.2.18 in [6], there exists, for each $\epsilon >0$, a $t_{0}=t_{0}(\epsilon )$ such that for $x\ge 1$ and $t>t_{0}$,

$$\begin{aligned} {\displaystyle \left|\frac{\log U(tx)-\log U(t)-\gamma \log x}{A(t)}- \frac{x^{\rho }-1}{\rho }\right|\le \epsilon x^{\rho +\epsilon }.} \end{aligned}$$

Take $t=e^{\frac{S_{m-k}}{k}}$, $x=e^{\frac{S_{m}-S_{m-k}}{k}}$ and observe that as $m\rightarrow \infty$, $t\rightarrow \infty$, $x\rightarrow e$ and $\frac{x^{\rho }-1}{\rho }\pm \epsilon x^{\rho +\epsilon }\rightarrow e^{\rho }\mu (\rho )\pm \epsilon e^{\rho +\epsilon }$ almost surely, see Lemma 1 in [15]. We get, for each $\epsilon >0$, almost surely

$$\begin{aligned} e^{\rho }\mu (\rho )- \epsilon e^{\rho +\epsilon }\le \limsup _m \frac{\log U(tx)-\log U(t)-\gamma \log x}{A(t)}, \end{aligned}$$

and so

$$\begin{aligned} e^{\rho }\mu (\rho )\le \limsup _m \frac{\log U(tx)-\log U(t)-\gamma \log x}{A(t)}. \end{aligned}$$

Similarly, $\liminf\limits_m\frac{\log U(tx)-\log U(t)-\gamma \log x}{A(t)}\le e^{\rho }\mu (\rho )$ almost surely. Thus, as $m\rightarrow \infty$, almost surely

$$\begin{aligned} \log U(tx)-\log U(t)-\gamma \log x=e^{\rho }\mu (\rho )A(t)+o(A(t)). \end{aligned}$$

Hence, as $m\rightarrow \infty$

$$\begin{aligned} \begin{array}{ll} f_m(T_m^{(\delta _m)})-\gamma =&{}\gamma \left( \frac{S_{m}-S_{m-k}}{k}-1\right) +A\left( e^{\frac{S_{m-k}}{k}}\right) e^{\rho }\mu (\rho ) \\ {} &{} +o_{p}\left( A\left( e^{\frac{S_{m-k}}{k}}\right) \right) +O_{p}(\delta _m) \end{array} \end{aligned}$$

(7)

Notice that the central limit theorem implies, as $m\rightarrow \infty$

$$\begin{aligned} \sqrt{k}\left( \frac{S_{m}-S_{m-k}}{k}-1\right) \overset{d}{\rightarrow }N_1, \end{aligned}$$

(8)

where $N_{1}$ is a random variable having a standard normal distribution.

On the other hand, by $\left( m\log \log m\right) ^{1/2}/k\rightarrow 0$ and using the law of the iterated logarithm, we have as $m\rightarrow \infty$

$$\begin{aligned} \frac{S_{m-k}}{k}-\frac{m}{k}{\mathop {\rightarrow }\limits ^{p}} -1, \end{aligned}$$

(9)

and by the fact that $A \in RV_{\rho }$, we get as $m\rightarrow \infty$

$$\begin{aligned} \frac{A\left( e^{\frac{S_{m-k}}{k}}\right) }{A\left( e^{m/k}\right) }{\mathop {\rightarrow }\limits ^{p}} e^{-\rho }. \end{aligned}$$

(10)

Choosing $\delta _{m}$ such that $\sqrt{k}\delta _{m}\rightarrow 0$ and combining (7), (8), (10) with $\sqrt{k}A(e^{m/k})\longrightarrow \lambda$, we have $\sqrt{k}\left( f_m(T_m^{(\delta _m)})-\gamma \right)$ is asymptotically normal with mean $\lambda \mu (\rho )$ and variance $\gamma ^{2}$. The same arguments show that $\sqrt{k}\left( f_m(T_m^{(-\delta _m)})-\gamma \right)$ is asymptotically normal with mean $\lambda \mu (\rho )$ and variance $\gamma ^{2}$. Since $f_{m}$ is an increasing function, we have for sufficiently large m

$$\begin{aligned} \sqrt{k}\left( f_m(T_m^{(-\delta _m)})-\gamma \right) \le \sqrt{k}\left( f_m(t^{*})-\gamma \right) \le \sqrt{k}\left( f_m(T_m^{(\delta _m)})-\gamma \right) , \end{aligned}$$

which gives the result (i).

To prove the asymptotic normality of ${\hat{\sigma _m}}$, we use the following expansion

$$\begin{aligned} \begin{array}{ll}\frac{{\hat{\sigma _m}}}{a\left( e^{(m-k)/k}\right) }-1= &{} \left( \frac{{\hat{\gamma _m}}}{\gamma }-1\right) \frac{\gamma }{t^*a\left( e^{S_{m-k}/k}\right) }\frac{a\left( e^{S_{m-k}/k}\right) }{a\left( e^{(m-k)/k}\right) } +\left( \frac{a\left( e^{S_{m-k}/k}\right) }{a\left( e^{(m-k)/k}\right) }-1\right) \\ &{} +\left( \frac{\gamma }{t^*a\left( e^{S_{m-k}/k}\right) }-1\right) \frac{a\left( e^{S_{m-k}/k}\right) }{a\left( e^{(m-k)/k}\right) }:=T_1+T_2+T_3.\end{array} \end{aligned}$$

First consider $T_1$. For sufficiently large m, we have almost surely

$$\begin{aligned} (1-\delta _m)\frac{a\left( e^{S_{m-k}/k}\right) }{\gamma U\left( e^{S_{m-k}/k}\right) }\le \frac{t^*a\left( e^{S_{m-k}/k}\right) }{\gamma }\le (1+\delta _m)\frac{a\left( e^{S_{m-k}/k}\right) }{\gamma U\left( e^{S_{m-k}/k}\right) }. \end{aligned}$$

(11)

From Lemma 1.2.9 in [6], we have $a(t)\sim \gamma U(t)$ as $t\rightarrow \infty$. Then from (11) we get that $\frac{\gamma }{t^*a\left( e^{S_{m-k}/k}\right) }\overset{p}{\rightarrow }1$ as $m\rightarrow \infty$. Since $a\in RV_{\gamma }$, (9) implies $\frac{a\left( e^{S_{m-k}/k}\right) }{a\left( e^{(m-k)/k}\right) }{\mathop {\rightarrow }\limits ^{p}} 1$ as $m\rightarrow \infty$. Hence, as $m\rightarrow \infty$

$$\begin{aligned} T_1=o_p\left( \frac{\sqrt{m-k}}{k}\right) . \end{aligned}$$

Next consider $T_2$. We have by Theorem 2.3.3 in [6]

$$\begin{aligned} \displaystyle \lim _{t\rightarrow \infty }\frac{a(tx)/a(t)-x^{\gamma }}{A(t)}=\frac{x^{\rho }-1}{\rho }x^{\gamma }, \end{aligned}$$

and consequently, for each $\epsilon >0$, there exists a $t_{0}=t_{0}(\epsilon )$ such that for $t>t_{0}$, $x\ge 1$,

$$\begin{aligned} \left|\frac{a(tx)/a(t)-x^{\gamma }}{A_0(t)}-x^{\gamma }\frac{x^{\rho }-1}{\rho }\right|\le \epsilon x^{\gamma +\rho +\epsilon },\ \text {with}\ A_0(t)\sim A(t),\ t\rightarrow \infty . \end{aligned}$$

Take $t=e^{(m-k)/k}$ and $x=e^{(S_{m-k}-(m-k))/k}$ and observe again that as $m\rightarrow \infty$, $t\rightarrow \infty$, $x\rightarrow 1$ and $x^{\gamma } \frac{x^{\rho }-1}{\rho }\pm \epsilon x^{\gamma +\rho +\epsilon }\rightarrow \pm \epsilon$ almost surely. We get

$$\begin{aligned} \begin{array}{ll}T_2&{}= e^{\gamma (S_{m-k}-(m-k))/k}-1+A_0\left( e^{(m-k)/k}\right) O_p(1)\\ {} &{} =\gamma (S_{m-k}-(m-k))/k+o_p(\sqrt{m-k}/k)+A_0\left( e^{(m-k)/k}\right) O_p(1), \end{array} \end{aligned}$$

and so, by the central limit theorem, as $m\rightarrow \infty$

$$\begin{aligned} \frac{k}{\sqrt{m-k}}T_2\overset{d}{\rightarrow }\gamma N_2. \end{aligned}$$

Notice that, without loss of generality, we can take $N_1$ and $N_2$ are independent random variables.

Thirdly, consider $T_3$. From (11), we have for sufficiently large m,

$$\begin{aligned} B_{\delta _m}=:\frac{1}{1+\delta _m}\left( \frac{\gamma U\left( e^{S_{m-k}/k}\right) }{a\left( e^{S_{m-k}/k}\right) }-1\right) -\frac{\delta _m}{1+\delta _m}\le \frac{\gamma }{t^*a\left( e^{S_{m-k}/k}\right) }-1\le B_{-\delta _m}. \end{aligned}$$

Next, adapting the Lemma 4.5.4 in [6] to the case where $\gamma$ is positive, we get that, if ($\gamma >-\rho$) or ($0<\gamma <-\rho$, $\lim _{t\rightarrow \infty }a(t)-\gamma U(t)=0$),

$$\begin{aligned} \lim _{t\rightarrow \infty }\frac{1-\frac{\gamma U(t)}{a(t)}}{A(t)}=\frac{1}{\rho +\gamma }. \end{aligned}$$

(12)

Combining this with $\sqrt{k}\delta _{m}\rightarrow 0$, we have as $m\rightarrow \infty$

$$\begin{aligned} \frac{k}{\sqrt{m-k}}B_{\delta _m}=\frac{k}{\sqrt{m-k}}B_{-\delta _m}=o_p(1). \end{aligned}$$

Thus, as $m\rightarrow \infty$

$$\begin{aligned} T_3=o_p\left( \frac{\sqrt{m-k}}{k}\right) . \end{aligned}$$

Finally, the combination of the three parts proves (ii).

The proof for $\gamma <0$ is the same as before with slight modifications. It proved by Louzaoui and El Arrouchi [15] that if $k\rightarrow \infty ,\ k/m\rightarrow 0$ and $k/\log m\rightarrow \infty$ as $m\rightarrow \infty$, the first order condition (2) ensures the existence of a solution $t^{*}$ of $h_m(t)=0$ such that, almost surely, $T_m^{(-\delta _m)}<t^{*}<T_m^{(\delta _m)}$ for some small $\delta _m>0$ where $T_m^{(\delta _m)}:=-\frac{1+\delta _m}{U(\infty )-R^{(k)}_{m-k}}$. Similar to (7), we have as $m\rightarrow \infty$

$$\begin{aligned} \begin{array}{ll}f_m(T_m^{(\pm \delta _m)})-\gamma =&{}\gamma \left( \frac{S_{m}-S_{m-k}}{k}-1\right) +A\left( e^{\frac{S_{m-k}}{k}}\right) e^{\rho }\mu (\rho ) \\ {} &{} +o_{p}\left( A\left( e^{\frac{S_{m-k}}{k}}\right) \right) +O_{p}(\delta _m) \end{array}. \end{aligned}$$

The rest is similar except that the relation (12) becomes for $\gamma <0$ as

$$\begin{aligned} \displaystyle \lim _{t\rightarrow \infty }\frac{\frac{U(\infty )- U(t)}{a(t)}+1/\gamma }{A(t)}=\frac{1}{\gamma (\rho +\gamma )}, \end{aligned}$$

provided the second order condition. Finally, the statement (2) follows directly from (7) and (10). $\square$

Remark 2.2

Notice that, for $0<\gamma <-\rho$, the relation (12) is a special case of the general relation

$$\begin{aligned} \lim _{t\rightarrow \infty }\dfrac{1-\frac{\gamma U(t)}{a(t)}-\displaystyle \lim _{x\rightarrow \infty }(a(x)/\gamma - U(x))}{A(t)}=\frac{1}{\gamma +\rho }, \end{aligned}$$

which unfortunately, for $\displaystyle \lim _{x\rightarrow \infty }(a(x)/\gamma - U(x))\ne 0$, not ensures the desired approximation of $T_3$. A similar remark can be made for $\gamma =-\rho$.

In order to obtain an unbiased estimator for $\gamma$, it can be seen from the asymptotic expansion (7) that is necessary to eliminate the term $A(e^{m/k})$ and to replace $\rho$ by any consistent estimator. Define, for integers $n\ge k\ge 1$,

$$\begin{aligned} {\widetilde{\gamma }}_{n}(.)={\hat{\gamma }}_{N^{(k)}(n)}([.]), \ \ \displaystyle {\hat{\rho }}_n=(\log 2)^{-1}\log \frac{{\widetilde{\gamma }}_{n}(n/(2\log n))-{\widetilde{\gamma }}_{n}(n/(4\log n))}{{\widetilde{\gamma }}_{n}(n/\log n)-{\widetilde{\gamma }}_{n}(n/(2\log n))} \end{aligned}$$

and

$$\begin{aligned} {\bar{\gamma }}_n(k)={\widetilde{\gamma }}_{n}(k)-\frac{{\widetilde{\gamma }}_{n}(k)-{\widetilde{\gamma }}_{n}(k/4)}{1-4^{{\hat{\rho }}_n}}, \end{aligned}$$

where $N^{(k)}(n)$ denote the number of k-record values in the sequence $X_1,\ldots ,X_n$ and [x] is the largest integer less than or equal to x. Then we have the following theorem.

Theorem 2.3

Assume (5) holds for $\rho <0$. Assume $k=k_n\rightarrow \infty$, $k/n\rightarrow 0$, $\log (n/k)\log \log n=o(k)$ and $k/\log n\rightarrow \infty$ as $n\rightarrow \infty$.

(i) If $\lim _{n\rightarrow \infty }\sqrt{k}A(n/k)=\lambda \in {\mathbb {R}}$, then as $n\rightarrow \infty$
$$\begin{aligned}\sqrt{k}({{\widetilde{\gamma }}}_n(k)-\gamma ){\mathop {\rightarrow }\limits ^{d}} {\mathcal {N}}\left( \lambda \mu (\rho ),\gamma ^{2}\right),\end{aligned}$$
and
$$\begin{aligned} \sqrt{k}({{\bar{\gamma }}}_n(k)-\gamma ){\mathop {\rightarrow }\limits ^{d}} \mathcal N\left( 0,{{\tilde{\sigma }}}^{2}\right) , \end{aligned}$$
with $\displaystyle {{\tilde{\sigma }}}^{2}=\frac{(1-4^{\rho })^2+4^{2\rho }}{2(1-4^{\rho })^2}\gamma ^2$.
(ii) If $\lim _{n\rightarrow \infty }\sqrt{k}|A(n/k)|=\infty$, then as $n\rightarrow \infty$
$$\begin{aligned} (A(n/k))^{-1}({{\widetilde{\gamma }}}_n(k)-\gamma ){\mathop {\longrightarrow }\limits ^{p}}\mu (\rho ). \end{aligned}$$

Remark 2.4

Notice that if $\rho \ge -1/2$, ${{\tilde{\sigma }}} \ge \lambda ^2$ and, if $\rho \le -1/2$, ${{\tilde{\sigma }}} \le \lambda ^2$.

Proof

We use the same arguments as in above section. We can write for $\delta _m=o(1/\sqrt{k})$ as $m\rightarrow \infty$,

$$\begin{aligned} {\hat{\gamma _m}}(k)-\gamma =\gamma \left( \frac{S_{k}}{k}-1\right) +A\left( e^{\frac{m}{k}}\right) \mu (\rho )+o_{p}\left( A\left( e^{\frac{m}{k}}\right) \right) +o_{p}\left(\frac{1}{\sqrt{k}}\right), \end{aligned}$$

(13)

By using the Proposition 2.2 in [7], we have as $n\rightarrow \infty$, almost surely

$$\begin{aligned} N^{(k)}(n) - k \log (n/k) = B(k \log (n/k)) + O(\log (k \log (n/k))), \end{aligned}$$

where $\{B(t), t\ge 0\}$ is a standard Brownian motion, and by Theorem 1.3.1 in [5], we get as $n\rightarrow \infty$, almost surely

$$\begin{aligned} N^{(k)}(n)-k \log (n/k) = O\left( (k \log (n/k) \log (k \log (n/k)))^{1/2}\right) . \end{aligned}$$

Furthermore, since $\log (k \log (n/k)) < \log \log n$, we have as $n\rightarrow \infty$, almost surely

$$\begin{aligned} N^{(k)}(n)-k\log (n/k) = O((k \log (n/k) \log \log n)^{1/2}). \end{aligned}$$

Combining this with the fact $A\in RV_\rho$, we get from (13)

$$\begin{aligned} {\hat{\gamma }}_{N^{(k)}(n)}(k)-\gamma =\gamma \left( \frac{S_{k}}{k}-1\right) +A\left( {n/k}\right) \mu (\rho )+o_{p}(1/\sqrt{k})+o_{p}\left( A\left( {n/k}\right) \right) , \end{aligned}$$

and so, for $0 < s\le 1$

$$\begin{aligned} {\hat{\gamma }}_{N^{([sk])}(n)}([sk])-\gamma =\gamma \left( \frac{S_{[sk]}}{[sk]}-1\right) +A\left( {n/[sk]}\right) \mu (\rho )+o_{p}(1/\sqrt{k})+o_{p}\left( A\left( {n/[sk]}\right) \right) , \end{aligned}$$

which, by the Donsker’s invariance principle, gives for $0<s\le 1$ and $n\rightarrow \infty$,

$$\begin{aligned} {{\widetilde{\gamma }}}_{n}(sk)-\gamma =\gamma \frac{B(\sqrt{s})}{\sqrt{k}}+A\left( {n/k}\right) s^{-\rho }\mu (\rho )+o_{p}(1/\sqrt{k})+o_{p}\left( A\left( {n/k}\right) \right) . \end{aligned}$$

(14)

This gives the first part of (i) and (ii).

Next, observe that if $k\in \{n/\log n,n/(2\log n),n/(4\log n)\}$, then all conditions on the sequence k are fulfilled with $\lim _{n\rightarrow \infty }\sqrt{k}|A(n/k)|=\infty$. Since, as $n\rightarrow \infty$, $A(2 \log n) \sim 2^{\rho } A(\log n)$ and $A(4 \log n) \sim 4^{\rho } A(\log n)$, we have from the statement (ii)

$$\begin{aligned} \begin{array}{ll}{\hat{\rho }}_n&{}\displaystyle =(\log 2)^{-1}\log \frac{(A(\log n))^{-1}\left\{ {\widetilde{\gamma }}_{n}(n/(2\log n))-\gamma -({\widetilde{\gamma }}_{n}(n/(4\log n))-\gamma )\right\} }{(A(\log n))^{-1}\left\{ {\widetilde{\gamma }}_{n}(n/\log n)-\gamma -({\widetilde{\gamma }}_{n}(n/(2\log n))-\gamma )\right\} }\\ \\ &{}\displaystyle {\mathop {\rightarrow }\limits ^{p}}(\log 2)^{-1}\log \frac{2^{\rho }-4^{\rho }}{1-2^{\rho }}=\rho , \ \text {as}\ n\rightarrow \infty .\end{array} \end{aligned}$$

From (14) we deduce that if $\displaystyle \lim _{n\rightarrow \infty }\sqrt{k}A(n/k)=\lambda$,

$$\begin{aligned} \sqrt{k}\Big ({\widetilde{\gamma }}_{n}(k)-{\widetilde{\gamma }}_{n}(k/4)\Big ){\mathop {\rightarrow }\limits ^{d}}\gamma (B(1)-B(1/2))+\lambda (1-4^{\rho })\mu (\rho ), \ \text {as}\ n\rightarrow \infty , \end{aligned}$$

and so, as $n\rightarrow \infty$,

$$\begin{aligned} \begin{array}{ll}\displaystyle \sqrt{k} \left( {\widetilde{\gamma }}_{n}(k)-\frac{{\widetilde{\gamma }}_{n}(k)-{\widetilde{\gamma }}_{n}(k/4)}{1-4^{{\hat{\rho }}_n}}\right) &{} \displaystyle {\mathop {\rightarrow }\limits ^{d}}\gamma B(1)-\frac{\gamma }{1-4^{\rho }} (B(1)-B(1/2))\\ &{}\displaystyle =\frac{\gamma }{1-4^{\rho }}\left[ (1-4^{\rho })B(1/2)-4^{\rho }(B(1)-B(1/2))\right] ,\end{array} \end{aligned}$$

which gives the second part of (i). $\square$

Corollary 2.5

Assume the conditions of Theorem 2.3holds. Let $MSE({{\widetilde{\gamma }}}_n(k))$ and $MSE({{\bar{\gamma }}}_n(k))$ be the mean square errors of ${{\widetilde{\gamma }}}_n(k)$ and ${{\bar{\gamma }}}_n(k)$, respectively. If $\lim _{n\rightarrow \infty }\sqrt{k}A(n/k)=\lambda \in {\mathbb {R}}$, then

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{MSE({{\widetilde{\gamma }}}_n(k))}{MSE(\bar{\gamma }_n(k))}=\left\{ \begin{array}{lll}\ge 1 &{}\text {if}&{} \displaystyle \rho \le -1/2 \ \text {or} \ \rho \ge -1/2, |\lambda |\ge \frac{\gamma }{1-4^{\rho }}\sqrt{\frac{24^{\rho }-1}{2\mu (\rho )}}, \\ \le 1&{}\text {if} &{} \displaystyle \rho \ge -1/2, |\lambda |\le \frac{\gamma }{1-4^{\rho }}\sqrt{\frac{24^{\rho }-1}{2\mu (\rho )}}.\end{array}\right. \end{aligned}$$

Proof

It easily follows from Theorem 2.3. $\square$

Remark 2.6

1.
In the same way, we can obtain a similar results to those of Theorem 2.3 and Corollary 2.5 in the case where $\gamma <0$.
2.
The method used here cannot work for $\gamma = 0$ because the bounds found in [15] are almost surely constant, and therefore they are not asymptotically normal.
3.
This method is not applicable on bounds proposed by Zhou [18] since they are not symmetrical.

3 Simulation results

We now present some numerical results for the proposed bias correction. We consider here the Generalized Pareto distribution with $F(x)=1-(1+\frac{\gamma }{\sigma } x)^{-1/\gamma }$, for $x\ge 0$ and $\gamma , \ \sigma >0$, the Burr IV distribution with $F(x)= \{(\alpha /x-1)^{1/\alpha }+1\}^{-\beta }$, $0<x<\alpha$, $\beta >0$ and the standard Cauchy distribution with $F(x)=\frac{1}{2}+\frac{1}{\pi } \arctan (x)$, for $x\in {\mathbb {R}}$. For each of these distributions, we generate a random sample of size n. Moreover, for each of these random samples, the record values are picked up and then the corresponding estimates are computed. We report the simulation results in Tables 1, 2, 3 and 4. $\overline{{\tilde{\gamma }}}$ and $\overline{{\bar{\gamma }}}$ are the averages of 10,000 estimates of ${\tilde{\gamma }}$ and ${\bar{\gamma }}$ with $MSE({\tilde{\gamma }})$ and $MSE({\bar{\gamma }})$ denoted respectively their mean square errors. The simulated values are calculated for three sizes n against k with a reasonable number of the record $N^{(k)}(n)$ (by using the approximation $N^{(k)}(n)\sim k\log (n/k)$). We remark that when the mean squared error values were rounded to the fourth decimal place, some values were repeated. We observe that the simulated values of ${\tilde{\gamma }}$ and ${\bar{\gamma }}$ are close to the theoretical value of $\gamma$, and frequently, we have ${\bar{\gamma }}$ is closer to theoretical $\gamma$ than ${\tilde{\gamma }}$. Unfortunately, the balancing of the MSE’s did not allow us to conclude, and we think this is due to the change in the number of records in the original sample.

Table 1 Cauchy distribution: $\gamma = 1$ and $\rho = -2$

Full size table

Table 2 Burr IV distribution: $\alpha = \beta = 1$ and $\gamma =\rho = -1$

Full size table

Table 3 Generalized Pareto distribution: $\gamma = 0.5$, $\sigma = 1$ and $\rho = -2$

Full size table

Table 4 Burr IV distribution: $\alpha = 0.5$, $\beta = 1$, $\gamma =\rho = -0.5$

Full size table

4 Real data

In this section, we apply our estimation method on rainfall data, collected monthly from 1975 until 2007 at Melk Zhar Station in the Souss Massa region of Morocco (Fig. 1). This estimation are compared with the most used methods: the block maxima (GEV from (1)) and the POT (GPD from (3)). Table 5 shows the estimated parameters of the GEV distribution. The estimated shape parameter is positive, but the 95% confidence interval extends also below zero which proves $\gamma$ is not significantly away from 0 at the 5% significance level. Considering only block maxima, when just few years of observations are available, can cause a great waste of data since there could be more than one extreme measurement in a single block. In general, the Peaks over threshold approach can be used to get improved accuracy. In particular, if the block maxima can be fitted by a GEV distribution, then the excesses over a high threshold t can be fitted by a GPD. Two techniques are used for threshold selection, namely Mean Residual Life Plot and stability of parameter estimates. The linearity of the mean residual life and the stability of the GPD parameters are both reached when $t = 20$ which the excesses are composed by 89 observations. From Table 6, we have $\hat{\gamma }=2.5\times 10^{-8}$ which is very close to 0. Next, using equations in (4), Table 7 summarizes our estimates for some selected values for k which ensures again the closeness of $\gamma$ to 0. Consequently, the Gumbel model (GEV with $\gamma =0$) is a suitable model for our data. This is supported by diagnostic plots in Fig. 2. By adopting the Gumbel model, the associated return level $z_{\alpha }$ at return period $1/{\alpha }$ is $z_{\alpha } =\mu - \sigma \log (-\log (1-\alpha ))$, where $\mu$ and $\sigma$ are estimated in Table 8. It means that on average, $z_{\alpha }$ is exceeded once every $1/{\alpha }$ years. The return level estimates are given in Table 9. Hence, the return level estimates indicate that the maximum value 144.6 (maximum total monthly rainfall recorded in Melk Zhar, see Fig. 1) will not be exceeded in the next 20 years, but it will be exceeded in the next 50 years.

Table 5 Estimated parameters of the GEV

Full size table

Table 6 Estimated parameters for POT with threshold=20

Full size table

Table 7 Estimates for $(\gamma ,\sigma )$ based on equations in (4) by using the rainfall data

Full size table

Table 8 Estimated parameters of the Gumbel model

Full size table

Table 9 Return level estimates for the rainfall data

Full size table

Data availability statement

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Alawady, M.A., Barakat, H.M., Mansour, G.M., Husseiny, I.A.: Information measures and concomitants of $k$-record values based on Sarmanov family of bivariate distributions. Bull. Malays. Math. Sci. Soc. 46, 9 (2023). https://doi.org/10.1007/s40840-022-01396-9
Article MathSciNet MATH Google Scholar
Arnold, B.C., Balakrishnan, N., Nagaraja, H.N.: Record. Wiley, New York (1998)
Book Google Scholar
Barakat, H.M., Nigm, E.M., Khaled, O.M.: Statistical Techniques for Modelling Extreme Value Data and Related Applications. Cambridge Scholars Publishing, Newcastle upon Tyne (2019)
Google Scholar
Chandler, K.N.: The distribution and frequency of record values. J. R. Stat. Soc. Ser. B 14, 220–228 (1952)
MathSciNet MATH Google Scholar
Csörgő, M., Révész, P.: Strong Approximation in Probability and Statistics. Academic Press, New York (1981)
MATH Google Scholar
De Haan, L., Ferreira, A.: Extreme Value Theory. An Introduction. Springer, Berlin (2006)
Book MATH Google Scholar
Deheuvels, P., Nevzorov, V.B.: Limit laws for $k$-record times. J. Stat. Plan. Inference 38, 279–307 (1994)
Article MathSciNet MATH Google Scholar
Dekkers, A.L.M., Einmahl, J.H.J., de Haan, L.: A moment estimator for the index of an extreme-value distribution. Ann. Stat. 17, 1833–1855 (1989)
MathSciNet MATH Google Scholar
Drees, H., Ferreira, A., de Haan, L.: On maximum likelihood estimation of the extreme value index. Ann. Appl. Probab. 14, 1179–1201 (2004)
Article MathSciNet MATH Google Scholar
Dziubdziela, W., Kopocinski, B.: Limiting properties of the kth record values. Appl. Math. 15, 187–190 (1976)
MATH Google Scholar
El Arrouchi, M.: Characterization of tail distributions based on record values by using the Beurling’s Tauberian theorem. Extremes 20(1), 111–120 (2017)
Article MathSciNet MATH Google Scholar
El Arrouchi, M., Imlahi, A.: Optimal choice of $k_n$-records in the extreme value index estimation. Stat. Decis. 23(2), 101–115 (2005). https://doi.org/10.1524/stnd.2005.23.2.101
Article MATH Google Scholar
Elgawad, M.A.A., Barakat, H.M., Yan, T.: Bivariate limit theorems for record values based on random sample sizes. Sankhya A 82, 50–67 (2020). https://doi.org/10.1007/s13171-019-00167-2
Article MathSciNet MATH Google Scholar
Hill, B.M.: A simple general approach to inference about the tail of a distribution. Ann. Stat. 3, 1163–1174 (1975)
Article MathSciNet MATH Google Scholar
Louzaoui, A., El Arrouchi, M.: On the maximum likelihood estimation of extreme value index based on $k$-record values. J. Probab. Stat. (2020). https://doi.org/10.1155/2020/5497413
Article MathSciNet MATH Google Scholar
Pickands, J., III.: Statistical inference using extreme order statistics. Ann. Stat. 3, 119–131 (1975)
MathSciNet MATH Google Scholar
Resnick, S.I.: Extreme Values, Regular Variation and Point Processes. Springer, New York (1987)
Book MATH Google Scholar
Zhou, C.: Existence and consistency of the maximum likelihood estimator for the extreme value index. J. Multivar. Anal. 100(4), 794–815 (2009)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers and the editors for their comments and suggestions to improve this paper.

Funding

The authors state that no funding source for this paper.

Author information

Authors and Affiliations

Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco
Abderrahim Louzaoui & Mohamed El Arrouchi

Authors

Abderrahim Louzaoui
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed El Arrouchi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors have equally made contributions. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mohamed El Arrouchi.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Louzaoui, A., El Arrouchi, M. Improving the bias of a pseudo-maximum likelihood estimate of the extreme value index by k-records. J Stat Theory Appl 22, 54–69 (2023). https://doi.org/10.1007/s44199-023-00055-7

Download citation

Received: 08 February 2023
Accepted: 28 March 2023
Published: 06 April 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s44199-023-00055-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improving the bias of a pseudo-maximum likelihood estimate of the extreme value index by k-records

Abstract

Similar content being viewed by others

Improving Asymptotically Unbiased Extreme Value Index Estimation

Revisiting the Maximum Likelihood Estimation of a Positive Extreme Value Index

Computational Study of the Adaptive Estimation of the Extreme Value Index with Probability Weighted Moments

1 Introduction