1 Introduction

For an n-sample \(X_1,X_2,\ldots ,X_n\) from a continuous distribution function F, let \(X_{1,n} \le \cdots \le X_{n,n}\) be the corresponding order statistics. Recall that k-record process is defined in terms of the kth largest observations, see Dziubdziela and Kopocinski [10]. For any integer k, let

$$\begin{aligned} \nu ^{(k)}_{1}=k,\ \ \nu ^{(k)}_{i+1}=\min \left\{ j>\nu ^{(k)}_{i}: X_{j-k+1,j}>X_{\nu ^{(k)}_{i}-k+1,\nu ^{(k)}_{i}}\right\} , \end{aligned}$$

and then, the k-record values are defined by \(R^{(k)}_{i}=X_{\nu ^{(k)}_{i}-k+1,\nu ^{(k)}_{i}},\ \ i\ge 1\). Next, suppose that F belongs to the max-domain of attraction of an extreme value distribution \(G_{\gamma }\) (\(F \in D(G_{\gamma })\)) where \(\gamma \in {\mathbb {R}}\) is the extreme value index. That is, there exist sequences \(a_n > 0\) and \(b_n \in {\mathbb {R}}\) such that

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {P}}\left( a_n^{-1}(X_{n,n}-b_{n})\le x\right) =G_{\gamma }(x)=\exp \left[ -(1 +\gamma x)^{-1/\gamma }\right] , \end{aligned}$$
(1)

where \(1 + \gamma x > 0\). Let \(U(y) =\inf \{z: 1-F(z) \le 1/y\}\), for \(y \ge 1\). The first order condition (1), in term of U, is equivalent, for all \(x>0\), to

$$\begin{aligned} { \lim \limits _{t\rightarrow \infty }\frac{U(tx)-U(t)}{a(t)}=\frac{x^\gamma -1}{\gamma },} \end{aligned}$$
(2)

where a(.) is a some positive auxiliary function. It can be proved that (1) or (2) is equivalent to

$$\begin{aligned} \lim _{t\rightarrow t_*}{\mathbb {P}}(X_1\le \sigma (t)x +t|X_1>t)=D_{\gamma }(x)=1-(1 +\gamma x)^{-1/\gamma }, \end{aligned}$$
(3)

where \(1 + \gamma x > 0\) and \(\sigma (.)>0\) is a function with \(t_{*} = \sup \{y: F(y)< 1\}\le \infty\) is the right endpoint of F. The function \(D_{\gamma }\) is known as the Generalized Pareto Distribution. See [6] for more theoretical discussion on the max-domain of attraction.

The problem of estimating the extreme value index \(\gamma\) on the basis of the largest observations of the sample \((X_1,\ldots , X_n)\) has received very special attention in the classical extreme value theory. Many statistics based on higher order statistics have been proposed to estimate \(\gamma\) such as Hill’s estimator [14], Pickands’s estimator [16], Moment estimator derived by Dekkers et al. [8], Maximum likelihood (ML) estimator suggested by Drees et al. [9]. For further informations, [6] and [3] gives a good introduction which are rich in application, but gives even more theoretical and practical details on the estimation problem of the extreme value index. In the other hand, the example of the Resnick’s duality theorem [2, Theorem 2.3.3] or the caracterization of tail distributions [11] show that the extreme value theory is very linked to the theory of record values. A recent development in record theory can be found in [13] and [1]. Observing k-records only prevents the possibility of applying conventional estimators in extreme value statistics and therefore, the construction of estimators based on record values is essential [12]. This problem has not been sufficiently studied in the literature which has been revisited recently by Louzaoui and El Arrouchi [15] by using a maximum likelihood (ML) approach based on the top \(k+1\) highest k-records. More precisely, for \(k=k_m\) an intermediate sequence of integers satisfying \(k_m \rightarrow \infty\), \(k_m/m\rightarrow 0\) as \(m\rightarrow \infty\) with m is the number of k-records observed, they are showed that the conditional joint distribution of \(\left( R_{m-k+1}^{(k)}-R_{m-k}^{(k)},\ldots ,R_{m}^{(k)}-R_{m-k}^{(k)}\right)\) given \(R_{m-k}^{(k)}= y\) is the same as the unconditional joint distribution of the k-record values \(\left( Z^{(k)}_{1}, \ldots ,Z^{(k)}_{k}\right)\) from independent and identically distributed random variables \(Z_1,Z_2\ldots\) having a distribution \(F_y(z)= (F(z)-F(y))/(1-F(y))\ \ (z>y)\) (left-truncated distribution) which can be replaced, from (3), by \(D_{\gamma }(./\sigma )\) (Generalized Pareto Distribution). This result can be used to construct a pseudo maximum likelihood estimation \(({{\hat{\gamma }}},{{\hat{\sigma }}})\) of the unknown parameters \((\gamma ,\sigma )\); that is, based on the sample of k-record values \(\left( Z^{(k)}_{1}, \ldots ,Z^{(k)}_{k}\right)\), we can maximize the likelihood function

$$\begin{aligned} L = k^{k} \left( 1-D_{\gamma }(Y_{1}/\sigma )\right) ^{k}\prod _{i=1}^{k} \frac{d_{\gamma ,\sigma } (Y_{i})}{1-D_{\gamma }(Y_{i}/\sigma )}, \end{aligned}$$

with \(Y_{i} = R^{(k)}_{n-i+1}- R^{(k)}_{n-k}\), \(1 \le i\le k\) and \(d_{\gamma ,\sigma }(y)= \partial D_{\gamma }(y/\sigma )/\partial y\). Consequently, \({\hat{\gamma }}:\equiv {\hat{\gamma }}_{m}(k)\) and \({\hat{\sigma }}:\equiv {\hat{\sigma }}_{m}(k)\) are obtained by solving the likelihood equations

$$\begin{aligned} \left\{ \begin{aligned}&\log \left( 1+\frac{\gamma }{\sigma } Y_1\right) =\gamma , \\&\left( 1+\frac{1}{\frac{\gamma }{\sigma }Y_1}\right) \cdot \frac{1}{k} \sum \limits _{i=1}^{k}\frac{1}{1+\frac{\gamma }{\sigma }Y_i}=1/\gamma . \end{aligned} \right. \end{aligned}$$
(4)

with \(Y_{i} = R^{(k)}_{m-i+1}- R^{(k)}_{m-k}\). For \(\gamma = 0\), the equations are obtained by continuity. Put \(h_m(t):=q_m(t). g_m(t)-1\) where \(g_m(t):= \frac{1}{k}\sum \nolimits _{i=1}^{k}{\frac{1}{1+tY_i}}\), \(q_m(t):= \left( 1+\frac{1}{tY_1}\right) f_m(t)\) and \(f_m(t):= \log \left( 1+tY_1\right)\). Any solution \(({\hat{\gamma }},{\hat{\sigma }})\) of (4) satisfies \(h_m({\hat{\gamma }}/{\hat{\sigma }})=0\). Conversely, \(({\hat{\gamma }},{\hat{\sigma }})=(f_m(t^*),f_m(t^*)/t^*)\) is solution of (4) for any non-zero solution \(t^*\) of \(h_m(t)=0\). It can be easily seen that \(h_m(t)=0\) has a zero solution which must be dropped even if really \(\gamma =0\).

Under the first order condition (2), Louzaoui and El Arrouchi [15] have shown the existence of a random N such that the likelihood equations have a consistent solution \(({\hat{\gamma }}_{m},{\hat{\sigma }}_{m})\) for all \(m\ge N\). Here, we study their asymptotic normality under the so-called second order condition and derive another estimation of the extreme value index which is asymptotically unbiased and normal. The remainder of this paper is organized as follows. In Sect. 2, we establish the asymptotic normality of the ML estimators for \(\gamma \ne 0\) and then we propose a bias correction. Section 3 will devoted to some numerical studies which lend further support to our theoretical results with discussion. Finally, in Sect. 4, a real data set is analyzed by using the suggested methods.

2 Main results

The study of the asymptotic normality of ML estimators requires a second order condition which is a refinement of (2), see [6]. For some \(\gamma\) positive, there exists a auxiliary function A(t) (with constant sign and \(A(t) \rightarrow 0\) as \(t\rightarrow \infty\)) and a real index \(\rho \le 0\), such that, \(\forall x>0\),

$$\begin{aligned} { \lim \limits _{t\rightarrow \infty }\frac{\log U(tx)-\log U(t)-\gamma \log x}{A(t)}= \frac{x^{\rho }-1}{\rho }.} \end{aligned}$$
(5)

The parameter \(\rho\) governs the rate of convergence in (2). It can be shown that necessarily \(|A| \in RV_{\rho }\). The parameter \(\rho\) is of primordial importance in the adaptive choice of the threshold to be considered in the estimation of the extreme value index [6, 12]. For \(\gamma <0\) this condition becomes, \(\forall x>0\)

$$\begin{aligned} {\lim \limits _{t\rightarrow \infty }\frac{\log (U(\infty )- U(tx))-\log (U(\infty )- U(t))-\gamma \log x}{A(t)}= \frac{x^{\rho }-1}{\rho },} \end{aligned}$$
(6)

We now state our main result, stating asymptotic normality of ML estimators.

Theorem 2.1

Suppose that the second order condition (5) or (6) holds and suppose \(k=k_m\rightarrow \infty\), \(k/m\rightarrow 0\), \(\left( m\log \log m\right) ^{1/2}/k\rightarrow 0\), as \(m\rightarrow \infty\).

  • (1) If \(\lim \nolimits _{m\rightarrow \infty }\sqrt{k}A(e^{{m/k}})=\lambda \in {\mathbb {R}}\), then as \(m\rightarrow \infty\)

    • (i) \(\sqrt{k}({\hat{\gamma }}_m(k)-\gamma ) {\mathop {\longrightarrow }\limits ^{d}} {\mathcal {N}}\left( \lambda \mu (\rho ),\gamma ^{2}\right)\), with \(\mu (\rho )=\frac{1-e^{-\rho }}{\rho }\).

    • (ii) When (\(-\rho <\gamma\)) or (\(-\rho>\gamma >0,\ \lim _{t\rightarrow \infty }a(t)-\gamma U(t)=0\)) or (\(\gamma <0\)), we have as \(m\rightarrow \infty\)

      $$\begin{aligned} \sqrt{k}\left( {\hat{\gamma }}_m(k)-\gamma ,\sqrt{\frac{k}{m-k}}\left( \frac{{\hat{\sigma _m}}(k)}{a\left( e^{(m/k)-1}\right) }-1\right) \right) {\mathop {\rightarrow }\limits ^{d}} {\mathcal {N}}\left( \lambda b_{\gamma ,\rho },\Sigma \right) , \end{aligned}$$

      with \({{\mathcal {N}}}\) normal distribution, \(b_{\gamma ,\rho }=(\mu (\rho ),0)\) and the covariance matrix \(\Sigma\) is given by \(\left( \begin{array}{cc} \gamma ^2&{} 0 \\ 0 &{} \gamma ^2 \\ \end{array} \right)\).

  • (2) If \(\lim \nolimits _{n\rightarrow \infty }\sqrt{k}|A(e^{{m/k}})|=+\infty\), then as \(m\rightarrow \infty\)

    $$\begin{aligned} \left( A(e^{{m/k}})\right) ^{-1}({\hat{\gamma }}_m(k)-\gamma ){\mathop {\longrightarrow }\limits ^{p}}\mu (\rho ). \end{aligned}$$

Proof

Let \(\{E_i,i\ge 1\}\) be an independent and identically distributed sequence of standard exponential random variable and \(S_j = E_1 + \cdots + E_j,\ j \ge 1\). Denote the hazard function of F by \(H(x)=-\log (1-F(x))\). It can be seen easily that, for \(x\ge 1\), \(U(x)=H^{\leftarrow }(\log (x))\) and \(H^{\leftarrow }\) is a strictly increasing function, since F is continuous. From this and the Relation (4.7) in [17], we get the following representation

$$\begin{aligned} \left\{ R_j^{(k)},j\ge 1\right\} {\mathop {=}\limits ^{d}}\left\{ H^{\leftarrow }\left( S_j/k\right) , \ j\ge 1\right\} =\left\{ U\left( e^{S_j/k}\right) , \ j\ge 1\right\} . \end{aligned}$$

Without loss of generality, we can assume that \(R_{m-j}^{(k)}=U\left( e^{\frac{S_{m-j}}{k}}\right)\), where \(0\le j\le k\) and \(m\ge 1\).

For the case \(\gamma >0\), Louzaoui and El Arrouchi [15] have given bounds for the solution \(t^*\) of \(h_m(t)=0\). More Precisely, under the first order condition (2) and when \(\delta _m\rightarrow 0,\ k\rightarrow \infty ,\ k/m\rightarrow 0\) and \(k/\log m\rightarrow \infty\) as \(m\rightarrow \infty\), they proved the existence of N, a random integer, such that, \(h_m(T_m^{(\delta _m)})<0\) and \(h_m(T_m^{(-\delta _m)})>0\) for any \(m\ge N\) almost surely, where \(T_m^{(\delta _m)}:=(1+\delta _m)/R^{(k)}_{m-k}\) (see [15, Lemma 3]). From this, the existence of a random variable \(T_m^*\in [T_m^{(-\delta _m)},T_m^{(\delta _m)}]\) such that, almost surely, \(h_m(T_m^*)=0\) is assured by the mean value theorem. Notice that condition \(\left( m\log \log m\right) ^{1/2}/k\rightarrow 0\) implies \(k/\log m\rightarrow \infty\).

Let \(W_{m}=\frac{R_{m}^{(k)}}{R_{m-k}^{(k)}}\). We have, \(m\rightarrow \infty\),

$$\begin{aligned} f_m(T_m^{(\delta _m)})-\gamma =&\log (W_{m}+\delta _{m}(W_{m}-1))-\gamma \\ =&\log W_{m} + \delta _{m}\frac{W_{m}-1}{W_{m}}- \gamma +o_{p}(\delta _{m})\\ =&\log U\left( e^{\frac{S_{m}-S_{m-k}}{k}}e^{\frac{S_{m-k}}{k}}\right) -\log U\left( e^{\frac{S_{m-k}}{k}}\right) -\gamma \frac{S_{m}-S_{m-k}}{k}\\&+\gamma \left( \frac{S_{m}-S_{m-k}}{k}-1\right) +\delta _{m}\frac{W_{m}-1}{W_{m}}+o_{p}(\delta _{m}). \end{aligned}$$

From (5) and by Theorem B.2.18 in [6], there exists, for each \(\epsilon >0\), a \(t_{0}=t_{0}(\epsilon )\) such that for \(x\ge 1\) and \(t>t_{0}\),

$$\begin{aligned} {\displaystyle \left|\frac{\log U(tx)-\log U(t)-\gamma \log x}{A(t)}- \frac{x^{\rho }-1}{\rho }\right|\le \epsilon x^{\rho +\epsilon }.} \end{aligned}$$

Take \(t=e^{\frac{S_{m-k}}{k}}\), \(x=e^{\frac{S_{m}-S_{m-k}}{k}}\) and observe that as \(m\rightarrow \infty\), \(t\rightarrow \infty\), \(x\rightarrow e\) and \(\frac{x^{\rho }-1}{\rho }\pm \epsilon x^{\rho +\epsilon }\rightarrow e^{\rho }\mu (\rho )\pm \epsilon e^{\rho +\epsilon }\) almost surely, see Lemma 1 in [15]. We get, for each \(\epsilon >0\), almost surely

$$\begin{aligned} e^{\rho }\mu (\rho )- \epsilon e^{\rho +\epsilon }\le \limsup _m \frac{\log U(tx)-\log U(t)-\gamma \log x}{A(t)}, \end{aligned}$$

and so

$$\begin{aligned} e^{\rho }\mu (\rho )\le \limsup _m \frac{\log U(tx)-\log U(t)-\gamma \log x}{A(t)}. \end{aligned}$$

Similarly, \(\liminf\limits_m\frac{\log U(tx)-\log U(t)-\gamma \log x}{A(t)}\le e^{\rho }\mu (\rho )\) almost surely. Thus, as \(m\rightarrow \infty\), almost surely

$$\begin{aligned} \log U(tx)-\log U(t)-\gamma \log x=e^{\rho }\mu (\rho )A(t)+o(A(t)). \end{aligned}$$

Hence, as \(m\rightarrow \infty\)

$$\begin{aligned} \begin{array}{ll} f_m(T_m^{(\delta _m)})-\gamma =&{}\gamma \left( \frac{S_{m}-S_{m-k}}{k}-1\right) +A\left( e^{\frac{S_{m-k}}{k}}\right) e^{\rho }\mu (\rho ) \\ {} &{} +o_{p}\left( A\left( e^{\frac{S_{m-k}}{k}}\right) \right) +O_{p}(\delta _m) \end{array} \end{aligned}$$
(7)

Notice that the central limit theorem implies, as \(m\rightarrow \infty\)

$$\begin{aligned} \sqrt{k}\left( \frac{S_{m}-S_{m-k}}{k}-1\right) \overset{d}{\rightarrow }N_1, \end{aligned}$$
(8)

where \(N_{1}\) is a random variable having a standard normal distribution.

On the other hand, by \(\left( m\log \log m\right) ^{1/2}/k\rightarrow 0\) and using the law of the iterated logarithm, we have as \(m\rightarrow \infty\)

$$\begin{aligned} \frac{S_{m-k}}{k}-\frac{m}{k}{\mathop {\rightarrow }\limits ^{p}} -1, \end{aligned}$$
(9)

and by the fact that \(A \in RV_{\rho }\), we get as \(m\rightarrow \infty\)

$$\begin{aligned} \frac{A\left( e^{\frac{S_{m-k}}{k}}\right) }{A\left( e^{m/k}\right) }{\mathop {\rightarrow }\limits ^{p}} e^{-\rho }. \end{aligned}$$
(10)

Choosing \(\delta _{m}\) such that \(\sqrt{k}\delta _{m}\rightarrow 0\) and combining (7), (8), (10) with \(\sqrt{k}A(e^{m/k})\longrightarrow \lambda\), we have \(\sqrt{k}\left( f_m(T_m^{(\delta _m)})-\gamma \right)\) is asymptotically normal with mean \(\lambda \mu (\rho )\) and variance \(\gamma ^{2}\). The same arguments show that \(\sqrt{k}\left( f_m(T_m^{(-\delta _m)})-\gamma \right)\) is asymptotically normal with mean \(\lambda \mu (\rho )\) and variance \(\gamma ^{2}\). Since \(f_{m}\) is an increasing function, we have for sufficiently large m

$$\begin{aligned} \sqrt{k}\left( f_m(T_m^{(-\delta _m)})-\gamma \right) \le \sqrt{k}\left( f_m(t^{*})-\gamma \right) \le \sqrt{k}\left( f_m(T_m^{(\delta _m)})-\gamma \right) , \end{aligned}$$

which gives the result (i).

To prove the asymptotic normality of \({\hat{\sigma _m}}\), we use the following expansion

$$\begin{aligned} \begin{array}{ll}\frac{{\hat{\sigma _m}}}{a\left( e^{(m-k)/k}\right) }-1= &{} \left( \frac{{\hat{\gamma _m}}}{\gamma }-1\right) \frac{\gamma }{t^*a\left( e^{S_{m-k}/k}\right) }\frac{a\left( e^{S_{m-k}/k}\right) }{a\left( e^{(m-k)/k}\right) } +\left( \frac{a\left( e^{S_{m-k}/k}\right) }{a\left( e^{(m-k)/k}\right) }-1\right) \\ &{} +\left( \frac{\gamma }{t^*a\left( e^{S_{m-k}/k}\right) }-1\right) \frac{a\left( e^{S_{m-k}/k}\right) }{a\left( e^{(m-k)/k}\right) }:=T_1+T_2+T_3.\end{array} \end{aligned}$$

First consider \(T_1\). For sufficiently large m, we have almost surely

$$\begin{aligned} (1-\delta _m)\frac{a\left( e^{S_{m-k}/k}\right) }{\gamma U\left( e^{S_{m-k}/k}\right) }\le \frac{t^*a\left( e^{S_{m-k}/k}\right) }{\gamma }\le (1+\delta _m)\frac{a\left( e^{S_{m-k}/k}\right) }{\gamma U\left( e^{S_{m-k}/k}\right) }. \end{aligned}$$
(11)

From Lemma 1.2.9 in [6], we have \(a(t)\sim \gamma U(t)\) as \(t\rightarrow \infty\). Then from (11) we get that \(\frac{\gamma }{t^*a\left( e^{S_{m-k}/k}\right) }\overset{p}{\rightarrow }1\) as \(m\rightarrow \infty\). Since \(a\in RV_{\gamma }\), (9) implies \(\frac{a\left( e^{S_{m-k}/k}\right) }{a\left( e^{(m-k)/k}\right) }{\mathop {\rightarrow }\limits ^{p}} 1\) as \(m\rightarrow \infty\). Hence, as \(m\rightarrow \infty\)

$$\begin{aligned} T_1=o_p\left( \frac{\sqrt{m-k}}{k}\right) . \end{aligned}$$

Next consider \(T_2\). We have by Theorem 2.3.3 in [6]

$$\begin{aligned} \displaystyle \lim _{t\rightarrow \infty }\frac{a(tx)/a(t)-x^{\gamma }}{A(t)}=\frac{x^{\rho }-1}{\rho }x^{\gamma }, \end{aligned}$$

and consequently, for each \(\epsilon >0\), there exists a \(t_{0}=t_{0}(\epsilon )\) such that for \(t>t_{0}\), \(x\ge 1\),

$$\begin{aligned} \left|\frac{a(tx)/a(t)-x^{\gamma }}{A_0(t)}-x^{\gamma }\frac{x^{\rho }-1}{\rho }\right|\le \epsilon x^{\gamma +\rho +\epsilon },\ \text {with}\ A_0(t)\sim A(t),\ t\rightarrow \infty . \end{aligned}$$

Take \(t=e^{(m-k)/k}\) and \(x=e^{(S_{m-k}-(m-k))/k}\) and observe again that as \(m\rightarrow \infty\), \(t\rightarrow \infty\), \(x\rightarrow 1\) and \(x^{\gamma } \frac{x^{\rho }-1}{\rho }\pm \epsilon x^{\gamma +\rho +\epsilon }\rightarrow \pm \epsilon\) almost surely. We get

$$\begin{aligned} \begin{array}{ll}T_2&{}= e^{\gamma (S_{m-k}-(m-k))/k}-1+A_0\left( e^{(m-k)/k}\right) O_p(1)\\ {} &{} =\gamma (S_{m-k}-(m-k))/k+o_p(\sqrt{m-k}/k)+A_0\left( e^{(m-k)/k}\right) O_p(1), \end{array} \end{aligned}$$

and so, by the central limit theorem, as \(m\rightarrow \infty\)

$$\begin{aligned} \frac{k}{\sqrt{m-k}}T_2\overset{d}{\rightarrow }\gamma N_2. \end{aligned}$$

Notice that, without loss of generality, we can take \(N_1\) and \(N_2\) are independent random variables.

Thirdly, consider \(T_3\). From (11), we have for sufficiently large m,

$$\begin{aligned} B_{\delta _m}=:\frac{1}{1+\delta _m}\left( \frac{\gamma U\left( e^{S_{m-k}/k}\right) }{a\left( e^{S_{m-k}/k}\right) }-1\right) -\frac{\delta _m}{1+\delta _m}\le \frac{\gamma }{t^*a\left( e^{S_{m-k}/k}\right) }-1\le B_{-\delta _m}. \end{aligned}$$

Next, adapting the Lemma 4.5.4 in [6] to the case where \(\gamma\) is positive, we get that, if (\(\gamma >-\rho\)) or (\(0<\gamma <-\rho\), \(\lim _{t\rightarrow \infty }a(t)-\gamma U(t)=0\)),

$$\begin{aligned} \lim _{t\rightarrow \infty }\frac{1-\frac{\gamma U(t)}{a(t)}}{A(t)}=\frac{1}{\rho +\gamma }. \end{aligned}$$
(12)

Combining this with \(\sqrt{k}\delta _{m}\rightarrow 0\), we have as \(m\rightarrow \infty\)

$$\begin{aligned} \frac{k}{\sqrt{m-k}}B_{\delta _m}=\frac{k}{\sqrt{m-k}}B_{-\delta _m}=o_p(1). \end{aligned}$$

Thus, as \(m\rightarrow \infty\)

$$\begin{aligned} T_3=o_p\left( \frac{\sqrt{m-k}}{k}\right) . \end{aligned}$$

Finally, the combination of the three parts proves (ii).

The proof for \(\gamma <0\) is the same as before with slight modifications. It proved by Louzaoui and El Arrouchi [15] that if \(k\rightarrow \infty ,\ k/m\rightarrow 0\) and \(k/\log m\rightarrow \infty\) as \(m\rightarrow \infty\), the first order condition (2) ensures the existence of a solution \(t^{*}\) of \(h_m(t)=0\) such that, almost surely, \(T_m^{(-\delta _m)}<t^{*}<T_m^{(\delta _m)}\) for some small \(\delta _m>0\) where \(T_m^{(\delta _m)}:=-\frac{1+\delta _m}{U(\infty )-R^{(k)}_{m-k}}\). Similar to (7), we have as \(m\rightarrow \infty\)

$$\begin{aligned} \begin{array}{ll}f_m(T_m^{(\pm \delta _m)})-\gamma =&{}\gamma \left( \frac{S_{m}-S_{m-k}}{k}-1\right) +A\left( e^{\frac{S_{m-k}}{k}}\right) e^{\rho }\mu (\rho ) \\ {} &{} +o_{p}\left( A\left( e^{\frac{S_{m-k}}{k}}\right) \right) +O_{p}(\delta _m) \end{array}. \end{aligned}$$

The rest is similar except that the relation (12) becomes for \(\gamma <0\) as

$$\begin{aligned} \displaystyle \lim _{t\rightarrow \infty }\frac{\frac{U(\infty )- U(t)}{a(t)}+1/\gamma }{A(t)}=\frac{1}{\gamma (\rho +\gamma )}, \end{aligned}$$

provided the second order condition. Finally, the statement (2) follows directly from (7) and (10). \(\square\)

Remark 2.2

Notice that, for \(0<\gamma <-\rho\), the relation (12) is a special case of the general relation

$$\begin{aligned} \lim _{t\rightarrow \infty }\dfrac{1-\frac{\gamma U(t)}{a(t)}-\displaystyle \lim _{x\rightarrow \infty }(a(x)/\gamma - U(x))}{A(t)}=\frac{1}{\gamma +\rho }, \end{aligned}$$

which unfortunately, for \(\displaystyle \lim _{x\rightarrow \infty }(a(x)/\gamma - U(x))\ne 0\), not ensures the desired approximation of \(T_3\). A similar remark can be made for \(\gamma =-\rho\).

In order to obtain an unbiased estimator for \(\gamma\), it can be seen from the asymptotic expansion (7) that is necessary to eliminate the term \(A(e^{m/k})\) and to replace \(\rho\) by any consistent estimator. Define, for integers \(n\ge k\ge 1\),

$$\begin{aligned} {\widetilde{\gamma }}_{n}(.)={\hat{\gamma }}_{N^{(k)}(n)}([.]), \ \ \displaystyle {\hat{\rho }}_n=(\log 2)^{-1}\log \frac{{\widetilde{\gamma }}_{n}(n/(2\log n))-{\widetilde{\gamma }}_{n}(n/(4\log n))}{{\widetilde{\gamma }}_{n}(n/\log n)-{\widetilde{\gamma }}_{n}(n/(2\log n))} \end{aligned}$$

and

$$\begin{aligned} {\bar{\gamma }}_n(k)={\widetilde{\gamma }}_{n}(k)-\frac{{\widetilde{\gamma }}_{n}(k)-{\widetilde{\gamma }}_{n}(k/4)}{1-4^{{\hat{\rho }}_n}}, \end{aligned}$$

where \(N^{(k)}(n)\) denote the number of k-record values in the sequence \(X_1,\ldots ,X_n\) and [x] is the largest integer less than or equal to x. Then we have the following theorem.

Theorem 2.3

Assume (5) holds for \(\rho <0\). Assume \(k=k_n\rightarrow \infty\), \(k/n\rightarrow 0\), \(\log (n/k)\log \log n=o(k)\) and \(k/\log n\rightarrow \infty\) as \(n\rightarrow \infty\).

  • (i) If \(\lim _{n\rightarrow \infty }\sqrt{k}A(n/k)=\lambda \in {\mathbb {R}}\), then as \(n\rightarrow \infty\)

    $$\begin{aligned}\sqrt{k}({{\widetilde{\gamma }}}_n(k)-\gamma ){\mathop {\rightarrow }\limits ^{d}} {\mathcal {N}}\left( \lambda \mu (\rho ),\gamma ^{2}\right),\end{aligned}$$

    and

    $$\begin{aligned} \sqrt{k}({{\bar{\gamma }}}_n(k)-\gamma ){\mathop {\rightarrow }\limits ^{d}} \mathcal N\left( 0,{{\tilde{\sigma }}}^{2}\right) , \end{aligned}$$

    with \(\displaystyle {{\tilde{\sigma }}}^{2}=\frac{(1-4^{\rho })^2+4^{2\rho }}{2(1-4^{\rho })^2}\gamma ^2\).

  • (ii) If \(\lim _{n\rightarrow \infty }\sqrt{k}|A(n/k)|=\infty\), then as \(n\rightarrow \infty\)

    $$\begin{aligned} (A(n/k))^{-1}({{\widetilde{\gamma }}}_n(k)-\gamma ){\mathop {\longrightarrow }\limits ^{p}}\mu (\rho ). \end{aligned}$$

Remark 2.4

Notice that if \(\rho \ge -1/2\), \({{\tilde{\sigma }}} \ge \lambda ^2\) and, if \(\rho \le -1/2\), \({{\tilde{\sigma }}} \le \lambda ^2\).

Proof

We use the same arguments as in above section. We can write for \(\delta _m=o(1/\sqrt{k})\) as \(m\rightarrow \infty\),

$$\begin{aligned} {\hat{\gamma _m}}(k)-\gamma =\gamma \left( \frac{S_{k}}{k}-1\right) +A\left( e^{\frac{m}{k}}\right) \mu (\rho )+o_{p}\left( A\left( e^{\frac{m}{k}}\right) \right) +o_{p}\left(\frac{1}{\sqrt{k}}\right), \end{aligned}$$
(13)

By using the Proposition 2.2 in [7], we have as \(n\rightarrow \infty\), almost surely

$$\begin{aligned} N^{(k)}(n) - k \log (n/k) = B(k \log (n/k)) + O(\log (k \log (n/k))), \end{aligned}$$

where \(\{B(t), t\ge 0\}\) is a standard Brownian motion, and by Theorem 1.3.1 in [5], we get as \(n\rightarrow \infty\), almost surely

$$\begin{aligned} N^{(k)}(n)-k \log (n/k) = O\left( (k \log (n/k) \log (k \log (n/k)))^{1/2}\right) . \end{aligned}$$

Furthermore, since \(\log (k \log (n/k)) < \log \log n\), we have as \(n\rightarrow \infty\), almost surely

$$\begin{aligned} N^{(k)}(n)-k\log (n/k) = O((k \log (n/k) \log \log n)^{1/2}). \end{aligned}$$

Combining this with the fact \(A\in RV_\rho\), we get from (13)

$$\begin{aligned} {\hat{\gamma }}_{N^{(k)}(n)}(k)-\gamma =\gamma \left( \frac{S_{k}}{k}-1\right) +A\left( {n/k}\right) \mu (\rho )+o_{p}(1/\sqrt{k})+o_{p}\left( A\left( {n/k}\right) \right) , \end{aligned}$$

and so, for \(0 < s\le 1\)

$$\begin{aligned} {\hat{\gamma }}_{N^{([sk])}(n)}([sk])-\gamma =\gamma \left( \frac{S_{[sk]}}{[sk]}-1\right) +A\left( {n/[sk]}\right) \mu (\rho )+o_{p}(1/\sqrt{k})+o_{p}\left( A\left( {n/[sk]}\right) \right) , \end{aligned}$$

which, by the Donsker’s invariance principle, gives for \(0<s\le 1\) and \(n\rightarrow \infty\),

$$\begin{aligned} {{\widetilde{\gamma }}}_{n}(sk)-\gamma =\gamma \frac{B(\sqrt{s})}{\sqrt{k}}+A\left( {n/k}\right) s^{-\rho }\mu (\rho )+o_{p}(1/\sqrt{k})+o_{p}\left( A\left( {n/k}\right) \right) . \end{aligned}$$
(14)

This gives the first part of (i) and (ii).

Next, observe that if \(k\in \{n/\log n,n/(2\log n),n/(4\log n)\}\), then all conditions on the sequence k are fulfilled with \(\lim _{n\rightarrow \infty }\sqrt{k}|A(n/k)|=\infty\). Since, as \(n\rightarrow \infty\), \(A(2 \log n) \sim 2^{\rho } A(\log n)\) and \(A(4 \log n) \sim 4^{\rho } A(\log n)\), we have from the statement (ii)

$$\begin{aligned} \begin{array}{ll}{\hat{\rho }}_n&{}\displaystyle =(\log 2)^{-1}\log \frac{(A(\log n))^{-1}\left\{ {\widetilde{\gamma }}_{n}(n/(2\log n))-\gamma -({\widetilde{\gamma }}_{n}(n/(4\log n))-\gamma )\right\} }{(A(\log n))^{-1}\left\{ {\widetilde{\gamma }}_{n}(n/\log n)-\gamma -({\widetilde{\gamma }}_{n}(n/(2\log n))-\gamma )\right\} }\\ \\ &{}\displaystyle {\mathop {\rightarrow }\limits ^{p}}(\log 2)^{-1}\log \frac{2^{\rho }-4^{\rho }}{1-2^{\rho }}=\rho , \ \text {as}\ n\rightarrow \infty .\end{array} \end{aligned}$$

From (14) we deduce that if \(\displaystyle \lim _{n\rightarrow \infty }\sqrt{k}A(n/k)=\lambda\),

$$\begin{aligned} \sqrt{k}\Big ({\widetilde{\gamma }}_{n}(k)-{\widetilde{\gamma }}_{n}(k/4)\Big ){\mathop {\rightarrow }\limits ^{d}}\gamma (B(1)-B(1/2))+\lambda (1-4^{\rho })\mu (\rho ), \ \text {as}\ n\rightarrow \infty , \end{aligned}$$

and so, as \(n\rightarrow \infty\),

$$\begin{aligned} \begin{array}{ll}\displaystyle \sqrt{k} \left( {\widetilde{\gamma }}_{n}(k)-\frac{{\widetilde{\gamma }}_{n}(k)-{\widetilde{\gamma }}_{n}(k/4)}{1-4^{{\hat{\rho }}_n}}\right) &{} \displaystyle {\mathop {\rightarrow }\limits ^{d}}\gamma B(1)-\frac{\gamma }{1-4^{\rho }} (B(1)-B(1/2))\\ &{}\displaystyle =\frac{\gamma }{1-4^{\rho }}\left[ (1-4^{\rho })B(1/2)-4^{\rho }(B(1)-B(1/2))\right] ,\end{array} \end{aligned}$$

which gives the second part of (i). \(\square\)

Corollary 2.5

Assume the conditions of Theorem 2.3holds. Let \(MSE({{\widetilde{\gamma }}}_n(k))\) and \(MSE({{\bar{\gamma }}}_n(k))\) be the mean square errors of \({{\widetilde{\gamma }}}_n(k)\) and \({{\bar{\gamma }}}_n(k)\), respectively. If \(\lim _{n\rightarrow \infty }\sqrt{k}A(n/k)=\lambda \in {\mathbb {R}}\), then

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{MSE({{\widetilde{\gamma }}}_n(k))}{MSE(\bar{\gamma }_n(k))}=\left\{ \begin{array}{lll}\ge 1 &{}\text {if}&{} \displaystyle \rho \le -1/2 \ \text {or} \ \rho \ge -1/2, |\lambda |\ge \frac{\gamma }{1-4^{\rho }}\sqrt{\frac{24^{\rho }-1}{2\mu (\rho )}}, \\ \le 1&{}\text {if} &{} \displaystyle \rho \ge -1/2, |\lambda |\le \frac{\gamma }{1-4^{\rho }}\sqrt{\frac{24^{\rho }-1}{2\mu (\rho )}}.\end{array}\right. \end{aligned}$$

Proof

It easily follows from Theorem 2.3. \(\square\)

Remark 2.6

  1. 1.

    In the same way, we can obtain a similar results to those of Theorem 2.3 and Corollary 2.5 in the case where \(\gamma <0\).

  2. 2.

    The method used here cannot work for \(\gamma = 0\) because the bounds found in [15] are almost surely constant, and therefore they are not asymptotically normal.

  3. 3.

    This method is not applicable on bounds proposed by Zhou [18] since they are not symmetrical.

3 Simulation results

We now present some numerical results for the proposed bias correction. We consider here the Generalized Pareto distribution with \(F(x)=1-(1+\frac{\gamma }{\sigma } x)^{-1/\gamma }\), for \(x\ge 0\) and \(\gamma , \ \sigma >0\), the Burr IV distribution with \(F(x)= \{(\alpha /x-1)^{1/\alpha }+1\}^{-\beta }\), \(0<x<\alpha\), \(\beta >0\) and the standard Cauchy distribution with \(F(x)=\frac{1}{2}+\frac{1}{\pi } \arctan (x)\), for \(x\in {\mathbb {R}}\). For each of these distributions, we generate a random sample of size n. Moreover, for each of these random samples, the record values are picked up and then the corresponding estimates are computed. We report the simulation results in Tables 1, 2, 3 and 4. \(\overline{{\tilde{\gamma }}}\) and \(\overline{{\bar{\gamma }}}\) are the averages of 10,000 estimates of \({\tilde{\gamma }}\) and \({\bar{\gamma }}\) with \(MSE({\tilde{\gamma }})\) and \(MSE({\bar{\gamma }})\) denoted respectively their mean square errors. The simulated values are calculated for three sizes n against k with a reasonable number of the record \(N^{(k)}(n)\) (by using the approximation \(N^{(k)}(n)\sim k\log (n/k)\)). We remark that when the mean squared error values were rounded to the fourth decimal place, some values were repeated. We observe that the simulated values of \({\tilde{\gamma }}\) and \({\bar{\gamma }}\) are close to the theoretical value of \(\gamma\), and frequently, we have \({\bar{\gamma }}\) is closer to theoretical \(\gamma\) than \({\tilde{\gamma }}\). Unfortunately, the balancing of the MSE’s did not allow us to conclude, and we think this is due to the change in the number of records in the original sample.

Table 1 Cauchy distribution: \(\gamma = 1\) and \(\rho = -2\)
Table 2 Burr IV distribution: \(\alpha = \beta = 1\) and \(\gamma =\rho = -1\)
Table 3 Generalized Pareto distribution: \(\gamma = 0.5\), \(\sigma = 1\) and \(\rho = -2\)
Table 4 Burr IV distribution: \(\alpha = 0.5\), \(\beta = 1\), \(\gamma =\rho = -0.5\)

4 Real data

In this section, we apply our estimation method on rainfall data, collected monthly from 1975 until 2007 at Melk Zhar Station in the Souss Massa region of Morocco (Fig. 1). This estimation are compared with the most used methods: the block maxima (GEV from (1)) and the POT (GPD from (3)). Table 5 shows the estimated parameters of the GEV distribution. The estimated shape parameter is positive, but the 95% confidence interval extends also below zero which proves \(\gamma\) is not significantly away from 0 at the 5% significance level. Considering only block maxima, when just few years of observations are available, can cause a great waste of data since there could be more than one extreme measurement in a single block. In general, the Peaks over threshold approach can be used to get improved accuracy. In particular, if the block maxima can be fitted by a GEV distribution, then the excesses over a high threshold t can be fitted by a GPD. Two techniques are used for threshold selection, namely Mean Residual Life Plot and stability of parameter estimates. The linearity of the mean residual life and the stability of the GPD parameters are both reached when \(t = 20\) which the excesses are composed by 89 observations. From Table 6, we have \(\hat{\gamma }=2.5\times 10^{-8}\) which is very close to 0. Next, using equations in (4), Table 7 summarizes our estimates for some selected values for k which ensures again the closeness of \(\gamma\) to 0. Consequently, the Gumbel model (GEV with \(\gamma =0\)) is a suitable model for our data. This is supported by diagnostic plots in Fig. 2. By adopting the Gumbel model, the associated return level \(z_{\alpha }\) at return period \(1/{\alpha }\) is \(z_{\alpha } =\mu - \sigma \log (-\log (1-\alpha ))\), where \(\mu\) and \(\sigma\) are estimated in Table 8. It means that on average, \(z_{\alpha }\) is exceeded once every \(1/{\alpha }\) years. The return level estimates are given in Table 9. Hence, the return level estimates indicate that the maximum value 144.6 (maximum total monthly rainfall recorded in Melk Zhar, see Fig. 1) will not be exceeded in the next 20 years, but it will be exceeded in the next 50 years.

Fig. 1
figure 1

Monthly total rainfall from 1975 until 2007

Table 5 Estimated parameters of the GEV
Table 6 Estimated parameters for POT with threshold=20
Table 7 Estimates for \((\gamma ,\sigma )\) based on equations in (4) by using the rainfall data
Fig. 2
figure 2

Diagnostic plots for the Gumbel model

Table 8 Estimated parameters of the Gumbel model
Table 9 Return level estimates for the rainfall data