Priority statement and some properties of t-lgHill estimator

We acknowledge the priority on the introduction of the formula of t-lgHill estimator for the positive extreme value index. We provide a novel motivation for this estimator based on ecologically driven dynamical systems. Another motivation is given directly by applying the general t-Hill procedure to log-gamma distribution. We illustrate the good quality of t-lgHill estimator in comparison to classical Hill estimator on the novel data of the concentration of arsenic in drinking water in the rural area of the Arica and Parinacota Region, Chile.


Theoretical motivation of t-lgHill estimator
This is a priority letter on the first introduction of t-lgHill estimator for the positive extreme value index γ > 0: , where and X 1,n ≤ • • • ≤ X n,n are the order statistics of X 1 , . . ., X n .We recall that the statistics M (j) kn,n , j = 1, 2 are introduced in [2], while γ(H) kn,n is nothing else but the Hill [6] estimator.We also recall that the estimator H L kn,n , as well as, the Hill estimator, is applicable for distributions F , which generalized inverse F ← satisfies condition: , t > 1 varies regularly at infinity with positive index γ.H L kn,n estimates parameter γ, but for better readability we also use the tail index α = 1/γ.
We have found that the estimator H L kn,n was firstly developed in [5], see (2.10) therein.We have not been aware of this fact in time of publication of [7], where we developed the estimator H L kn,n , see Theorem 1 in [7].The main purpose of this priority letter is to demonstrate that the estimator H L kn,n was introduced in [5] and [7] by applying different approaches.In [5]  the estimator H L kn,n has been built by assuming the following second regular variation condition: where ρ < 0 is the second order parameter, while A(t) is a measurable function with the constant sign near infinity and A(t) → 0 as t → ∞.Precisely, the authors of the paper [5] considered the generalized Jackknife statistic where γ(MR) kn,n is the so-called moment ratio estimator (see [1]) and q n is the ratio of asymptotic biases of Hill and moment ratio estimators.From Theorem 1 in [1] it follows that q n = 1 − ρ.Substituting q n by 1 − ρn , where ρn is a weakly consistent estimator of the parameter ρ, we obtain the estimator ρn .
This estimator was introduced firstly in [9].Motivated by the fact that asymptotic bias and variance of existing estimators of ρ are high, the authors of [5] considered the estimator γ G kn,n (−1) ≡ H L kn,n .We observed that Theorem 1 in [7] is a direct consequence of the distributional representation of H L kn,n provided in [5], see (2.11) therein.We also recognize a numerical typo in variance in Theorem 1 ( [7]), where the asymptotic variance 8 should be replaced by 5.
In [7] the estimator H L kn,n was introduced by using the t-score methodology to log-gamma distribution with pdf where θ consists of a pair of positive parameters (c, α).
In particular, in [7] we introduced a t-score moment estimator θ as the solution of equations where S F is a scalar score of a random variable with distribution F (and smooth density f ).Since the log-gamma distribution in (1) has two parameters, we need m = 2 yielding two t-score equations, namely equations ( 15) and ( 16) in [7].This defines the t-lgHill estimator of 1/α given by (19) in [7].
The t-score (see [3]) of distribution F (with prescribed support) is defined as where f is the pdf of F and η is a strictly increasing smooth function.It expresses a relative change of a "basic component of the density", i.e., density divided by the Jacobian of mapping η.Notice that we have the corresponding relation between t-score and Fisher score S G (x; θ) = T F (x; θ), where F (x) = G(η(x)) and S G (y; θ) = − d dy ln(g(y)) with g as a pdf of G.In [8] it is shown how a dynamical system given by a t-score function for some class of monotonic data transformations generates consistent extreme value estimators.
We suppose that the solution x * of T F (x; θ) = 0 is unique (t-score mean).Notice that the choice of η may change the value of t-score mean.Since our distribution has the support (1, ∞), we want to find a function, which maps it into a whole real line.This can be done e.g. by choosing η(x) = ln(ln(x)), motivated by the well-known iterated logarithm law (see [10]).However first or third iteration cannot be used due to the demanded mapping between the support and the real line.This choice implies the form of t-score mean x * = exp( c α ) > 0. Clearly, T F maps η uniquely on S G .But, does there exist another smooth function η(x) different from ln(ln(x)) such that S G (x; θ) = T F (x; θ)?This relation implies the exact second-order differential equation of the form where h(x; θ) = S G (x; θ)f (x; θ).If we allow η to depend on the parameters, then we have several different functions, see [8].By direct integration we get where H is a primitive function of h w.r.t.x.One can check that for S G (x; θ) = α ln(x) − c, the right hand side of equation ( 2) does not depend on parameters if and only if K(θ) = 0. Therefore, η(x) = ln(ln(x)) + k, k ∈ R, which implies the uniqueness.

Environmental applications of t-lgHill estimator
The good robustness quality of t-lgHill estimator was already illustrated in [7].Namely, t-lgHill and t-Hill estimators are robust and also reasonably efficient and thus convenient for mass balance modelling of glaciers and threshold estimators for lava eruptions.
In this paper we illustrate the good quality of t-lgHill estimator on the example of the concentration of arsenic in drinking water in the rural area of the Arica and Parinacota Region, Chile.These data are novel, yet unpublished and provided by the Regional Ministry of Health of the Region of Arica and Parinacota, corresponding to the measurements made in drinking water provided by the Rural Potable Water System (APR) or by a Precarious System (SP).Because these systems do not always provide drinking water according to the Chilean norm, the Regional Ministerial Secretary (SEREMI) of Health of Arica and Parinacota, periodically performs measurements of water quality in the rural area of the above mentioned Region.In the period 2017-2018 they reported 274 measurements of the concentration of arsenic in drinking water in various locations in the rural area.Many of these measurements are above the allowed standard 0.01 mg of arsenic per liter.In Figure 1 we plot a comparison of t-lgHill and Hill estimators.As we can see t-lgHill is very stable in comparison with Hill estimator in the range k ∈ {20, . . ., 50}.By further analysis of Figure 1 we can conclude that: (i) Considering a heuristic rule: a first constant flat area (from the left) in the plot gives a reasonable estimate of the tail index α.The t-lgHill plot is almost constant in the range 10 ≤ k ≤ 50, while for the Hill plot we have 1 < k ≤ 10.Hence, keeping in mind a small number of observations, we can conclude that both estimators yield an estimate of the tail index approximately equal to 1.
(ii) From the t-lgHill plot we can see that estimates of α are close to 1 when the sample fraction k grows from 1 to n − 1.The same conclusion also holds for Hill plot.We acknowledge the suggestion of the Referee to estimate the second-order parameter ρ, which in agreement with a Referee's comment is indeed close to −1 for large values of the sample fraction k.See also the range k ∈ [266, 272] in Figure 2, where we provide a set of sample path of the estimator ρ(τ) n , studied in [4].

Fig. 1 :
Fig.1: Plots of t-lgHill and Hill estimators as functions of the sample fractions for the concentration of arsenic in drinking water.