Abstract
The extreme value theory is very popular in applied sciences including finance, economics, hydrology and many other disciplines. In univariate extreme value theory, we model the data by a suitable distribution from the general max-domain of attraction characterized by its tail index; there are three broad classes of tails—the Pareto type, the Weibull type and the Gumbel type. The simplest and most common estimator of the tail index is the Hill estimator that works only for Pareto type tails and has a high bias; it is also highly non-robust in presence of outliers with respect to the assumed model. There have been some recent attempts to produce asymptotically unbiased or robust alternative to the Hill estimator; however all the robust alternatives work for any one type of tail. This paper proposes a new general estimator of the tail index that is both robust and has smaller bias under all the three tail types compared to the existing robust estimators. This essentially produces a robust generalization of the estimator proposed by Matthys and Beirlant (Stat Sin 13:853–880, 2003) under the same model approximation through a suitable exponential regression framework using the density power divergence. The robustness properties of the estimator are derived in the paper along with an extensive simulation study. A method for bias correction is also proposed with application to some real data examples.
Similar content being viewed by others
References
Alfons A, Holzer J, Templ M (2013a) laeken: estimation of indicators on social exclusion and poverty. http://CRAN.R-project.org/package=laeken, R Package Version 0.4.4
Alfons A, Kraft S (2012) simPopulation: simulation of synthetic populations for surveys based on sample data. http://CRAN.R-project.org/package=simPopulation, R Package version 0.4.0
Alfons A, Kraft S, Templ M, Filzmoser P (2011a) Simulation of close-to-reality population data for household surveys with application to EU-SILC. Stat Methods Appl 20(3):383–407
Alfons A, Templ S (2013) Estimation of social exclusion indicators from complex surveys: the R package leaken. J Stat Softw 54(15):1–25
Alfons A, Templ M, Filzmoser P, Holzer J (2011b) Robust Pareto tail modeling for the estimation of indicators on social exclusion using the R Package laeken. Research report CS-2011-2, Department of Statistics and Probability Theory, Vienna University of Technology
Alfons A, Templ M, Filzmoser P (2013b) Robust estimation of economic indicators from survey samples based on Pareto tail modelling. J R Stat Soc Ser C 62(2):271–286
Basu A, Harris IR, Hjort NL, Jones MC (1998) Robust and efficient estimation by minimising a density power divergence. Biometrika 85:549–559
Beirlant J, Dierckx G, Goegebeur Y, Matthys G (1999) Tail index estimation and an exponential regression model. Extremes 2(2):177–200
Beirlant J, Vynckier P, Teugels JL (1996) Tail index estimation, Pareto quantile plots, and regression diagnostics. J Am Stat Assoc 31(436):1659–1667
de Haan LFM (1970) On regular variation and its application to the weak convergence of sample extremes. In: Mathematical Centre tracts, Mathematisch Centrum
Dekkers ALM, Einmahl JHJ, de Haan L (1989) A moment estimator for the index of an extreme value distribution. Ann Stat 17:1833–1855
Ghosh A, Basu A (2013) Robust estimation for independent non-homogeneous observations using density power divergence with applications to linear regression. Electron J Stat 7:2420–2456
Gnedenko BV (1943) Sur la distribution limite du terme maximum d’une série aléatoire. Ann Math 44:423–453
Goegebeur Y, Guillou A, Rietsch T (2014) Robust conditional Weibull-type estimation. Ann Inst Stat Math 67:479–514
Hampel FR (1968) Contributions to the theory of robust estimation. Ph. D. thesis, University of California, Berkeley, USA
Hampel FR (1974) The influence curve and its role in robust estimation. J Am Stat Assoc 69:383–393
Hill BM (1975) A simple general approach to inference about the tail of a distribution. Ann Stat 3:1163–1174
Hosking JRM, Wallis JR (1987) Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29:339–349
Hulliger B, Alfons A, Bruch C, Filzmoser P, Graf M, Kolb J-P, Lehtonen R, Lussmann D, Meraner A, Mnnich R, Nedyalkova D, Schoch T, Templ M, Valaste M, Veijanen A, Zins S (2011) Report on the simulation results: deliverable D7.1, AMELI project
Kim M, Lee S (2008) Estimation of a tail index based on minimum density power divergence. J Multivar Anal 99:2453–2471
Marazzi A, Yohai V (2004) Adaptively truncated maximum likelihood regression with asymmetric errors. J Stat Plan Inference 122:271–291
Matthys G, Beirlant J (2003) Estimating the extreme value index and high quantiles with exponential regression models. Stat Sin 13:853–880
Pak RJ (2013) A robust estimation for the composite lognormal-Pareto model. Commun Stat Appl Methods 20(4):311–320
Pickands J III (1975) Statistical inference using extreme order statistics. Ann Stat 3:119–131
Smith RL (1987) Estimating tails of probability distributions. Ann Stat 15:1174–1207
Stigler SM (1977) Do robust estimators work with real data? Ann Stat 5:1055–1098
Vandewalle B, Beirlant J, Hubert M (2004) A robust estimator of the tail index based on an exponential regression model. In: Hubert M, Pison G, Struyf A, van Aelst S (eds) Theory and applications of recent robust methods. Birkhauser, Basel, pp 367–376
Vandewalle B, Beirlant J, Christmann A, Hubert M (2007) A robust estimator for the tail index of Pareto-type distributions. Comput Stat Data Anal 51(12):6252–6268
Acknowledgments
The author would like to thank Prof. Ayanendranath Basu of the Indian Statistical Institute, India, for his valuable comments about this work and Prof. Peter Filzmoser of Vienna University of Technology, Austria for kindly providing the dataset used in Example 3 of Sect. 6. The author also wishes to thank two anonymous referees for their remarks that have led to an improved version of the paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Assumptions (A1)–(A7) of Ghosh and Basu (2013) under the Assumed Exponential Regression Model
We assume the set-up of an exponential regression model, where the random variables \(W_1, \ldots , W_n, \ldots \) are independent, but for each j, \(W_j\) follows an exponential distribution with mean \(\theta _j = \frac{\gamma }{1 - \left( \frac{j}{k+1}\right) ^\gamma }\). In this paper we have approximated the distribution of the transformed variable \(Y_j\), as defined in Sect. 2, by the distribution of \(W_j\) for each \(j=1, \ldots , k-1\). Now we will present a brief argument to show that Assumptions (A1)–(A7) of Ghosh and Basu (2013), required for asymptotic consistency and normality of the MDPDE, hold under the present set-up of exponential regression model.
First note that Assumption (A1)–(A3) and (A5) hold directly from the form of an exponential distribution function. Next, as shown in the proof of Theorem 1, the matrix \(J^{(i)}\), as per the notation of Ghosh and Basu (2013), is a positive scalar given by
So, the matrix \(\varPsi _n\) is in fact a positive scalar with \(\lambda _0 = \lim \limits _{k\rightarrow \infty }\varPsi _n = a_\gamma >0\); this implies that (A4) also holds. Finally we need to prove three limiting statements of Assumptions (A6) and (A7) of Ghosh and Basu (2013). We only present the proof of first one, namely (noting that we are dealing with scalar parameter \(\gamma \) here)
Here \(\nabla _g\) represents the derivative with respect to our parameter of interest \(\gamma \). The proof of the others are similar and hence omitted.
To prove (17), note that under this present model, we have for each i,
where \(C_i= (1+\alpha )\widetilde{J}_\alpha \left( \frac{i}{k+1}\right) \theta _i\) and \(\psi (w) = \frac{\alpha }{(1+\alpha )^2} + (w-1) e^{-\alpha w}\). However, letting \(W_i^* = \frac{W_i}{\theta _i}\), we get that \(W_1^*, \ldots , W_{k-1}^*\) are independent and identically distributed observations from a standard exponential distribution with mean 1. So, we have
However, it is easy to check that both the terms \(\left( \frac{1}{k-1} \sum _{i=1}^{k-1}|C_i|\right) \) and \(\left( \max _{1\le i \le k-1}|C_i|\right) \) are bounded as \(k\rightarrow \infty \). Thus, by Dominated Convergence Theorem, we have
and hence (17) holds.
Appendix 2: Some comments on the estimator proposed by Vandewalle et al. (2004)
Vandewalle et al. (2004) presented an interesting and practically important footstep in statistics by justifying the necessity of combining two apparently contradictory theory of extreme value statistics and robust statistics, primarily for the Pareto-Type tails (\(\gamma >0\)). They have used the robust regression method proposed by Marazzi and Yohai (2004) and the exponential regression model developed in Beirlant et al. (1999) given by
where \(g_j\) are independent and identically distributed standard exponential random variables. The proposed estimator was examined through an interesting real data example where its robustness was illustrated clearly.
While developing the robust estimator, Vandewalle et al. (2004) transformed the above model into a liner form given by Eq. (3.1) of their paper, which reads
where \(e_j = g_j -1\). Here comes our first little doubt by noting that the RHS of the Equations (18) and (19) are not equal; the closest form to the second that equals the first is
So, it needs to be clarified the reason of dropping \(g_j\) from the second term. After assuming the linearized form (19), they have re-parametrize it as
where \( t_j = \left( \frac{\gamma }{k+1}\right) ^{-\rho }\), \(\theta _1 = \gamma \), \(\theta _2= b_{n,k}\) and \(\sigma =\gamma \). Then, for the case \(\gamma > 0\), they have used the robust regression method proposed by Marazzi and Yohai (2004) to estimate the parameters \((\theta _1, ~\theta _2, ~\sigma )\). This regression method has high breakdown and efficiency for usual regression set-up that they have noted for proposing the robust estimator of \(\gamma \); However, the approach is computationally complicated. Moreover, under the transformed set-up (20) it is to be noted that \(\theta _1 = \sigma \); this constraint needs to be taken care of while solving for the estimator numerically and may have potential effect on the properties of the resulting estimator. This needs to be examined extensively through simulation or theoretical results, that was missing in the work of Vandewalle et al. (2004). They have also noted similar limitation of the work and made a comment in the “conclusion” that they would consider this issues in their future work. Considering all this doubts, we have decided not to consider this proposal in our simulation studies.
Rights and permissions
About this article
Cite this article
Ghosh, A. Divergence based robust estimation of the tail index through an exponential regression model. Stat Methods Appl 26, 181–213 (2017). https://doi.org/10.1007/s10260-016-0364-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-016-0364-9