Skip to main content
Log in

Divergence based robust estimation of the tail index through an exponential regression model

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

The extreme value theory is very popular in applied sciences including finance, economics, hydrology and many other disciplines. In univariate extreme value theory, we model the data by a suitable distribution from the general max-domain of attraction characterized by its tail index; there are three broad classes of tails—the Pareto type, the Weibull type and the Gumbel type. The simplest and most common estimator of the tail index is the Hill estimator that works only for Pareto type tails and has a high bias; it is also highly non-robust in presence of outliers with respect to the assumed model. There have been some recent attempts to produce asymptotically unbiased or robust alternative to the Hill estimator; however all the robust alternatives work for any one type of tail. This paper proposes a new general estimator of the tail index that is both robust and has smaller bias under all the three tail types compared to the existing robust estimators. This essentially produces a robust generalization of the estimator proposed by Matthys and Beirlant (Stat Sin 13:853–880, 2003) under the same model approximation through a suitable exponential regression framework using the density power divergence. The robustness properties of the estimator are derived in the paper along with an extensive simulation study. A method for bias correction is also proposed with application to some real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  • Alfons A, Holzer J, Templ M (2013a) laeken: estimation of indicators on social exclusion and poverty. http://CRAN.R-project.org/package=laeken, R Package Version 0.4.4

  • Alfons A, Kraft S (2012) simPopulation: simulation of synthetic populations for surveys based on sample data. http://CRAN.R-project.org/package=simPopulation, R Package version 0.4.0

  • Alfons A, Kraft S, Templ M, Filzmoser P (2011a) Simulation of close-to-reality population data for household surveys with application to EU-SILC. Stat Methods Appl 20(3):383–407

    Article  MathSciNet  MATH  Google Scholar 

  • Alfons A, Templ S (2013) Estimation of social exclusion indicators from complex surveys: the R package leaken. J Stat Softw 54(15):1–25

    Article  Google Scholar 

  • Alfons A, Templ M, Filzmoser P, Holzer J (2011b) Robust Pareto tail modeling for the estimation of indicators on social exclusion using the R Package laeken. Research report CS-2011-2, Department of Statistics and Probability Theory, Vienna University of Technology

  • Alfons A, Templ M, Filzmoser P (2013b) Robust estimation of economic indicators from survey samples based on Pareto tail modelling. J R Stat Soc Ser C 62(2):271–286

    Article  MathSciNet  Google Scholar 

  • Basu A, Harris IR, Hjort NL, Jones MC (1998) Robust and efficient estimation by minimising a density power divergence. Biometrika 85:549–559

    Article  MathSciNet  MATH  Google Scholar 

  • Beirlant J, Dierckx G, Goegebeur Y, Matthys G (1999) Tail index estimation and an exponential regression model. Extremes 2(2):177–200

    Article  MathSciNet  MATH  Google Scholar 

  • Beirlant J, Vynckier P, Teugels JL (1996) Tail index estimation, Pareto quantile plots, and regression diagnostics. J Am Stat Assoc 31(436):1659–1667

    MathSciNet  MATH  Google Scholar 

  • de Haan LFM (1970) On regular variation and its application to the weak convergence of sample extremes. In: Mathematical Centre tracts, Mathematisch Centrum

  • Dekkers ALM, Einmahl JHJ, de Haan L (1989) A moment estimator for the index of an extreme value distribution. Ann Stat 17:1833–1855

    Article  MathSciNet  MATH  Google Scholar 

  • Ghosh A, Basu A (2013) Robust estimation for independent non-homogeneous observations using density power divergence with applications to linear regression. Electron J Stat 7:2420–2456

    Article  MathSciNet  MATH  Google Scholar 

  • Gnedenko BV (1943) Sur la distribution limite du terme maximum d’une série aléatoire. Ann Math 44:423–453

    Article  MathSciNet  MATH  Google Scholar 

  • Goegebeur Y, Guillou A, Rietsch T (2014) Robust conditional Weibull-type estimation. Ann Inst Stat Math 67:479–514

    Article  MathSciNet  MATH  Google Scholar 

  • Hampel FR (1968) Contributions to the theory of robust estimation. Ph. D. thesis, University of California, Berkeley, USA

  • Hampel FR (1974) The influence curve and its role in robust estimation. J Am Stat Assoc 69:383–393

    Article  MathSciNet  MATH  Google Scholar 

  • Hill BM (1975) A simple general approach to inference about the tail of a distribution. Ann Stat 3:1163–1174

    Article  MathSciNet  MATH  Google Scholar 

  • Hosking JRM, Wallis JR (1987) Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29:339–349

    Article  MathSciNet  MATH  Google Scholar 

  • Hulliger B, Alfons A, Bruch C, Filzmoser P, Graf M, Kolb J-P, Lehtonen R, Lussmann D, Meraner A, Mnnich R, Nedyalkova D, Schoch T, Templ M, Valaste M, Veijanen A, Zins S (2011) Report on the simulation results: deliverable D7.1, AMELI project

  • Kim M, Lee S (2008) Estimation of a tail index based on minimum density power divergence. J Multivar Anal 99:2453–2471

    Article  MathSciNet  MATH  Google Scholar 

  • Marazzi A, Yohai V (2004) Adaptively truncated maximum likelihood regression with asymmetric errors. J Stat Plan Inference 122:271–291

    Article  MathSciNet  MATH  Google Scholar 

  • Matthys G, Beirlant J (2003) Estimating the extreme value index and high quantiles with exponential regression models. Stat Sin 13:853–880

    MathSciNet  MATH  Google Scholar 

  • Pak RJ (2013) A robust estimation for the composite lognormal-Pareto model. Commun Stat Appl Methods 20(4):311–320

    Google Scholar 

  • Pickands J III (1975) Statistical inference using extreme order statistics. Ann Stat 3:119–131

    Article  MathSciNet  MATH  Google Scholar 

  • Smith RL (1987) Estimating tails of probability distributions. Ann Stat 15:1174–1207

    Article  MathSciNet  MATH  Google Scholar 

  • Stigler SM (1977) Do robust estimators work with real data? Ann Stat 5:1055–1098

    Article  MathSciNet  MATH  Google Scholar 

  • Vandewalle B, Beirlant J, Hubert M (2004) A robust estimator of the tail index based on an exponential regression model. In: Hubert M, Pison G, Struyf A, van Aelst S (eds) Theory and applications of recent robust methods. Birkhauser, Basel, pp 367–376

    Chapter  Google Scholar 

  • Vandewalle B, Beirlant J, Christmann A, Hubert M (2007) A robust estimator for the tail index of Pareto-type distributions. Comput Stat Data Anal 51(12):6252–6268

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The author would like to thank Prof. Ayanendranath Basu of the Indian Statistical Institute, India, for his valuable comments about this work and Prof. Peter Filzmoser of Vienna University of Technology, Austria for kindly providing the dataset used in Example 3 of Sect.  6. The author also wishes to thank two anonymous referees for their remarks that have led to an improved version of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhik Ghosh.

Appendices

Appendix 1: Assumptions (A1)–(A7) of Ghosh and Basu (2013) under the Assumed Exponential Regression Model

We assume the set-up of an exponential regression model, where the random variables \(W_1, \ldots , W_n, \ldots \) are independent, but for each j, \(W_j\) follows an exponential distribution with mean \(\theta _j = \frac{\gamma }{1 - \left( \frac{j}{k+1}\right) ^\gamma }\). In this paper we have approximated the distribution of the transformed variable \(Y_j\), as defined in Sect.  2, by the distribution of \(W_j\) for each \(j=1, \ldots , k-1\). Now we will present a brief argument to show that Assumptions (A1)–(A7) of Ghosh and Basu (2013), required for asymptotic consistency and normality of the MDPDE, hold under the present set-up of exponential regression model.

First note that Assumption (A1)–(A3) and (A5) hold directly from the form of an exponential distribution function. Next, as shown in the proof of Theorem 1, the matrix \(J^{(i)}\), as per the notation of Ghosh and Basu (2013), is a positive scalar given by

$$\begin{aligned} J^{(i)} = \frac{(1+\alpha ^2)}{(1+\alpha )^3} \widetilde{J}\left( \frac{i}{k+1}\right) \theta _i^{\alpha -2}. \end{aligned}$$

So, the matrix \(\varPsi _n\) is in fact a positive scalar with \(\lambda _0 = \lim \limits _{k\rightarrow \infty }\varPsi _n = a_\gamma >0\); this implies that (A4) also holds. Finally we need to prove three limiting statements of Assumptions (A6) and (A7) of Ghosh and Basu (2013). We only present the proof of first one, namely (noting that we are dealing with scalar parameter \(\gamma \) here)

$$\begin{aligned} \lim _{N\rightarrow \infty } \sup _{k>1} \left\{ \frac{1}{k-1} \sum _{i=1}^{k-1} E \left[ |\nabla _{g}V_i(W_i;\theta )| I(|\nabla _{g}V_i(W_i;\theta )| > N)\right] \right\} = 0. \end{aligned}$$
(17)

Here \(\nabla _g\) represents the derivative with respect to our parameter of interest \(\gamma \). The proof of the others are similar and hence omitted.

To prove (17), note that under this present model, we have for each i,

$$\begin{aligned} \nabla _{g}V_i(W_i;\theta ) = C_i \left[ \frac{\alpha }{(1+\alpha )^2} + \left( \frac{W_i}{\theta _i} - 1\right) e^{-\frac{\alpha W_i}{\theta _i}}\right] = C_i \psi \left( \frac{W_i}{\theta _i}\right) , \end{aligned}$$

where \(C_i= (1+\alpha )\widetilde{J}_\alpha \left( \frac{i}{k+1}\right) \theta _i\) and \(\psi (w) = \frac{\alpha }{(1+\alpha )^2} + (w-1) e^{-\alpha w}\). However, letting \(W_i^* = \frac{W_i}{\theta _i}\), we get that \(W_1^*, \ldots , W_{k-1}^*\) are independent and identically distributed observations from a standard exponential distribution with mean 1. So, we have

$$\begin{aligned}&\frac{1}{k-1} \sum _{i=1}^{k-1} E \left[ |\nabla _{g}V_i(W_i;\theta )| I(|\nabla _{g}V_i(W_i;\theta )|> N)\right] \\&\quad = \frac{1}{k-1} \sum _{i=1}^{k-1} E \left[ |C_i| \left| \psi \left( \frac{W_i}{\theta _i}\right) \right| I(|C_i| \left| \psi \left( \frac{W_i}{\theta _i}\right) \right|> N)\right] \\&\quad = \frac{1}{k-1} \sum _{i=1}^{k-1} |C_i| E\left[ \left| \psi (W_i^*)\right| I\left( \left| \psi (W_i^*)\right|> \frac{N}{\max _{1\le i \le k-1}|C_i|}\right) \right] \\&\quad = E\left[ \left| \psi (W_1^*)\right| I( \left| \psi (W_1^*)\right| >\frac{N}{\max _{1\le i \le k-1}|C_i|})\right] \left( \frac{1}{k-1} \sum _{i=1}^{k-1}|C_i|\right) . \end{aligned}$$

However, it is easy to check that both the terms \(\left( \frac{1}{k-1} \sum _{i=1}^{k-1}|C_i|\right) \) and \(\left( \max _{1\le i \le k-1}|C_i|\right) \) are bounded as \(k\rightarrow \infty \). Thus, by Dominated Convergence Theorem, we have

$$\begin{aligned} \lim \limits _{N\rightarrow \infty } ~ E\left[ \left| \psi _1(W_1^*)\right| I\left( \left| \psi _1(W_1^*)\right| >\frac{N}{\max _{1\le i \le k-1}|C_i|}\right) \right] =0, \end{aligned}$$

and hence (17) holds.

Appendix 2: Some comments on the estimator proposed by Vandewalle et al. (2004)

Vandewalle et al. (2004) presented an interesting and practically important footstep in statistics by justifying the necessity of combining two apparently contradictory theory of extreme value statistics and robust statistics, primarily for the Pareto-Type tails (\(\gamma >0\)). They have used the robust regression method proposed by Marazzi and Yohai (2004) and the exponential regression model developed in Beirlant et al. (1999) given by

$$\begin{aligned} Y_j \sim _d \left( \gamma + b_{n,k}\left( \frac{\gamma }{k+1}\right) ^{-\rho }\right) g_j, \quad j=1, \ldots , k, \end{aligned}$$
(18)

where \(g_j\) are independent and identically distributed standard exponential random variables. The proposed estimator was examined through an interesting real data example where its robustness was illustrated clearly.

While developing the robust estimator, Vandewalle et al. (2004) transformed the above model into a liner form given by Eq. (3.1) of their paper, which reads

$$\begin{aligned} Y_j \sim _d \gamma + b_{n,k}\left( \frac{\gamma }{k+1}\right) ^{-\rho } + \gamma e_j, \quad j=1, \ldots , k, \end{aligned}$$
(19)

where \(e_j = g_j -1\). Here comes our first little doubt by noting that the RHS of the Equations (18) and (19) are not equal; the closest form to the second that equals the first is

$$\begin{aligned} \gamma + b_{n,k}\left( \frac{\gamma }{k+1}\right) ^{-\rho }g_j + \gamma e_j. \end{aligned}$$

So, it needs to be clarified the reason of dropping \(g_j\) from the second term. After assuming the linearized form (19), they have re-parametrize it as

$$\begin{aligned} Y_j = \theta _i + \theta _2t_j + \sigma e_j, \quad j=1, \ldots , k, \end{aligned}$$
(20)

where \( t_j = \left( \frac{\gamma }{k+1}\right) ^{-\rho }\), \(\theta _1 = \gamma \), \(\theta _2= b_{n,k}\) and \(\sigma =\gamma \). Then, for the case \(\gamma > 0\), they have used the robust regression method proposed by Marazzi and Yohai (2004) to estimate the parameters \((\theta _1, ~\theta _2, ~\sigma )\). This regression method has high breakdown and efficiency for usual regression set-up that they have noted for proposing the robust estimator of \(\gamma \); However, the approach is computationally complicated. Moreover, under the transformed set-up (20) it is to be noted that \(\theta _1 = \sigma \); this constraint needs to be taken care of while solving for the estimator numerically and may have potential effect on the properties of the resulting estimator. This needs to be examined extensively through simulation or theoretical results, that was missing in the work of Vandewalle et al. (2004). They have also noted similar limitation of the work and made a comment in the “conclusion” that they would consider this issues in their future work. Considering all this doubts, we have decided not to consider this proposal in our simulation studies.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghosh, A. Divergence based robust estimation of the tail index through an exponential regression model. Stat Methods Appl 26, 181–213 (2017). https://doi.org/10.1007/s10260-016-0364-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-016-0364-9

Keywords

Navigation