Divergence based robust estimation of the tail index through an exponential regression model

Ghosh, Abhik

doi:10.1007/s10260-016-0364-9

Divergence based robust estimation of the tail index through an exponential regression model

Original Paper
Published: 18 July 2016

Volume 26, pages 181–213, (2017)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

Abhik Ghosh ORCID: orcid.org/0000-0003-3688-4584¹

310 Accesses
14 Citations
1 Altmetric
Explore all metrics

Abstract

The extreme value theory is very popular in applied sciences including finance, economics, hydrology and many other disciplines. In univariate extreme value theory, we model the data by a suitable distribution from the general max-domain of attraction characterized by its tail index; there are three broad classes of tails—the Pareto type, the Weibull type and the Gumbel type. The simplest and most common estimator of the tail index is the Hill estimator that works only for Pareto type tails and has a high bias; it is also highly non-robust in presence of outliers with respect to the assumed model. There have been some recent attempts to produce asymptotically unbiased or robust alternative to the Hill estimator; however all the robust alternatives work for any one type of tail. This paper proposes a new general estimator of the tail index that is both robust and has smaller bias under all the three tail types compared to the existing robust estimators. This essentially produces a robust generalization of the estimator proposed by Matthys and Beirlant (Stat Sin 13:853–880, 2003) under the same model approximation through a suitable exponential regression framework using the density power divergence. The robustness properties of the estimator are derived in the paper along with an extensive simulation study. A method for bias correction is also proposed with application to some real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regression Estimator for the Tail Index

Article Open access 24 June 2020

A nonparametric estimator for the conditional tail index of Pareto-type distributions

Article 04 July 2019

Tail fitting for truncated and non-truncated Pareto-type distributions

Article 21 March 2016

References

Alfons A, Holzer J, Templ M (2013a) laeken: estimation of indicators on social exclusion and poverty. http://CRAN.R-project.org/package=laeken, R Package Version 0.4.4
Alfons A, Kraft S (2012) simPopulation: simulation of synthetic populations for surveys based on sample data. http://CRAN.R-project.org/package=simPopulation, R Package version 0.4.0
Alfons A, Kraft S, Templ M, Filzmoser P (2011a) Simulation of close-to-reality population data for household surveys with application to EU-SILC. Stat Methods Appl 20(3):383–407
Article MathSciNet MATH Google Scholar
Alfons A, Templ S (2013) Estimation of social exclusion indicators from complex surveys: the R package leaken. J Stat Softw 54(15):1–25
Article Google Scholar
Alfons A, Templ M, Filzmoser P, Holzer J (2011b) Robust Pareto tail modeling for the estimation of indicators on social exclusion using the R Package laeken. Research report CS-2011-2, Department of Statistics and Probability Theory, Vienna University of Technology
Alfons A, Templ M, Filzmoser P (2013b) Robust estimation of economic indicators from survey samples based on Pareto tail modelling. J R Stat Soc Ser C 62(2):271–286
Article MathSciNet Google Scholar
Basu A, Harris IR, Hjort NL, Jones MC (1998) Robust and efficient estimation by minimising a density power divergence. Biometrika 85:549–559
Article MathSciNet MATH Google Scholar
Beirlant J, Dierckx G, Goegebeur Y, Matthys G (1999) Tail index estimation and an exponential regression model. Extremes 2(2):177–200
Article MathSciNet MATH Google Scholar
Beirlant J, Vynckier P, Teugels JL (1996) Tail index estimation, Pareto quantile plots, and regression diagnostics. J Am Stat Assoc 31(436):1659–1667
MathSciNet MATH Google Scholar
de Haan LFM (1970) On regular variation and its application to the weak convergence of sample extremes. In: Mathematical Centre tracts, Mathematisch Centrum
Dekkers ALM, Einmahl JHJ, de Haan L (1989) A moment estimator for the index of an extreme value distribution. Ann Stat 17:1833–1855
Article MathSciNet MATH Google Scholar
Ghosh A, Basu A (2013) Robust estimation for independent non-homogeneous observations using density power divergence with applications to linear regression. Electron J Stat 7:2420–2456
Article MathSciNet MATH Google Scholar
Gnedenko BV (1943) Sur la distribution limite du terme maximum d’une série aléatoire. Ann Math 44:423–453
Article MathSciNet MATH Google Scholar
Goegebeur Y, Guillou A, Rietsch T (2014) Robust conditional Weibull-type estimation. Ann Inst Stat Math 67:479–514
Article MathSciNet MATH Google Scholar
Hampel FR (1968) Contributions to the theory of robust estimation. Ph. D. thesis, University of California, Berkeley, USA
Hampel FR (1974) The influence curve and its role in robust estimation. J Am Stat Assoc 69:383–393
Article MathSciNet MATH Google Scholar
Hill BM (1975) A simple general approach to inference about the tail of a distribution. Ann Stat 3:1163–1174
Article MathSciNet MATH Google Scholar
Hosking JRM, Wallis JR (1987) Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29:339–349
Article MathSciNet MATH Google Scholar
Hulliger B, Alfons A, Bruch C, Filzmoser P, Graf M, Kolb J-P, Lehtonen R, Lussmann D, Meraner A, Mnnich R, Nedyalkova D, Schoch T, Templ M, Valaste M, Veijanen A, Zins S (2011) Report on the simulation results: deliverable D7.1, AMELI project
Kim M, Lee S (2008) Estimation of a tail index based on minimum density power divergence. J Multivar Anal 99:2453–2471
Article MathSciNet MATH Google Scholar
Marazzi A, Yohai V (2004) Adaptively truncated maximum likelihood regression with asymmetric errors. J Stat Plan Inference 122:271–291
Article MathSciNet MATH Google Scholar
Matthys G, Beirlant J (2003) Estimating the extreme value index and high quantiles with exponential regression models. Stat Sin 13:853–880
MathSciNet MATH Google Scholar
Pak RJ (2013) A robust estimation for the composite lognormal-Pareto model. Commun Stat Appl Methods 20(4):311–320
Google Scholar
Pickands J III (1975) Statistical inference using extreme order statistics. Ann Stat 3:119–131
Article MathSciNet MATH Google Scholar
Smith RL (1987) Estimating tails of probability distributions. Ann Stat 15:1174–1207
Article MathSciNet MATH Google Scholar
Stigler SM (1977) Do robust estimators work with real data? Ann Stat 5:1055–1098
Article MathSciNet MATH Google Scholar
Vandewalle B, Beirlant J, Hubert M (2004) A robust estimator of the tail index based on an exponential regression model. In: Hubert M, Pison G, Struyf A, van Aelst S (eds) Theory and applications of recent robust methods. Birkhauser, Basel, pp 367–376
Chapter Google Scholar
Vandewalle B, Beirlant J, Christmann A, Hubert M (2007) A robust estimator for the tail index of Pareto-type distributions. Comput Stat Data Anal 51(12):6252–6268
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The author would like to thank Prof. Ayanendranath Basu of the Indian Statistical Institute, India, for his valuable comments about this work and Prof. Peter Filzmoser of Vienna University of Technology, Austria for kindly providing the dataset used in Example 3 of Sect. 6. The author also wishes to thank two anonymous referees for their remarks that have led to an improved version of the paper.

Author information

Authors and Affiliations

Interdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolkata, India
Abhik Ghosh

Authors

Abhik Ghosh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abhik Ghosh.

Appendices

Appendix 1: Assumptions (A1)–(A7) of Ghosh and Basu (2013) under the Assumed Exponential Regression Model

We assume the set-up of an exponential regression model, where the random variables $W_1, \ldots , W_n, \ldots $ are independent, but for each j, $W_j$ follows an exponential distribution with mean $\theta _j = \frac{\gamma }{1 - \left( \frac{j}{k+1}\right) ^\gamma }$. In this paper we have approximated the distribution of the transformed variable $Y_j$, as defined in Sect. 2, by the distribution of $W_j$ for each $j=1, \ldots , k-1$. Now we will present a brief argument to show that Assumptions (A1)–(A7) of Ghosh and Basu (2013), required for asymptotic consistency and normality of the MDPDE, hold under the present set-up of exponential regression model.

First note that Assumption (A1)–(A3) and (A5) hold directly from the form of an exponential distribution function. Next, as shown in the proof of Theorem 1, the matrix $J^{(i)}$, as per the notation of Ghosh and Basu (2013), is a positive scalar given by

$$\begin{aligned} J^{(i)} = \frac{(1+\alpha ^2)}{(1+\alpha )^3} \widetilde{J}\left( \frac{i}{k+1}\right) \theta _i^{\alpha -2}. \end{aligned}$$

So, the matrix $\varPsi _n$ is in fact a positive scalar with $\lambda _0 = \lim \limits _{k\rightarrow \infty }\varPsi _n = a_\gamma >0$; this implies that (A4) also holds. Finally we need to prove three limiting statements of Assumptions (A6) and (A7) of Ghosh and Basu (2013). We only present the proof of first one, namely (noting that we are dealing with scalar parameter $\gamma $ here)

$$\begin{aligned} \lim _{N\rightarrow \infty } \sup _{k>1} \left\{ \frac{1}{k-1} \sum _{i=1}^{k-1} E \left[ |\nabla _{g}V_i(W_i;\theta )| I(|\nabla _{g}V_i(W_i;\theta )| > N)\right] \right\} = 0. \end{aligned}$$

(17)

Here $\nabla _g$ represents the derivative with respect to our parameter of interest $\gamma $. The proof of the others are similar and hence omitted.

To prove (17), note that under this present model, we have for each i,

$$\begin{aligned} \nabla _{g}V_i(W_i;\theta ) = C_i \left[ \frac{\alpha }{(1+\alpha )^2} + \left( \frac{W_i}{\theta _i} - 1\right) e^{-\frac{\alpha W_i}{\theta _i}}\right] = C_i \psi \left( \frac{W_i}{\theta _i}\right) , \end{aligned}$$

where $C_i= (1+\alpha )\widetilde{J}_\alpha \left( \frac{i}{k+1}\right) \theta _i$ and $\psi (w) = \frac{\alpha }{(1+\alpha )^2} + (w-1) e^{-\alpha w}$. However, letting $W_i^* = \frac{W_i}{\theta _i}$, we get that $W_1^*, \ldots , W_{k-1}^*$ are independent and identically distributed observations from a standard exponential distribution with mean 1. So, we have

$$\begin{aligned}&\frac{1}{k-1} \sum _{i=1}^{k-1} E \left[ |\nabla _{g}V_i(W_i;\theta )| I(|\nabla _{g}V_i(W_i;\theta )|> N)\right] \\&\quad = \frac{1}{k-1} \sum _{i=1}^{k-1} E \left[ |C_i| \left| \psi \left( \frac{W_i}{\theta _i}\right) \right| I(|C_i| \left| \psi \left( \frac{W_i}{\theta _i}\right) \right|> N)\right] \\&\quad = \frac{1}{k-1} \sum _{i=1}^{k-1} |C_i| E\left[ \left| \psi (W_i^*)\right| I\left( \left| \psi (W_i^*)\right|> \frac{N}{\max _{1\le i \le k-1}|C_i|}\right) \right] \\&\quad = E\left[ \left| \psi (W_1^*)\right| I( \left| \psi (W_1^*)\right| >\frac{N}{\max _{1\le i \le k-1}|C_i|})\right] \left( \frac{1}{k-1} \sum _{i=1}^{k-1}|C_i|\right) . \end{aligned}$$

However, it is easy to check that both the terms $\left( \frac{1}{k-1} \sum _{i=1}^{k-1}|C_i|\right) $ and $\left( \max _{1\le i \le k-1}|C_i|\right) $ are bounded as $k\rightarrow \infty $. Thus, by Dominated Convergence Theorem, we have

$$\begin{aligned} \lim \limits _{N\rightarrow \infty } ~ E\left[ \left| \psi _1(W_1^*)\right| I\left( \left| \psi _1(W_1^*)\right| >\frac{N}{\max _{1\le i \le k-1}|C_i|}\right) \right] =0, \end{aligned}$$

and hence (17) holds.

Appendix 2: Some comments on the estimator proposed by Vandewalle et al. (2004)

Vandewalle et al. (2004) presented an interesting and practically important footstep in statistics by justifying the necessity of combining two apparently contradictory theory of extreme value statistics and robust statistics, primarily for the Pareto-Type tails ($\gamma >0$). They have used the robust regression method proposed by Marazzi and Yohai (2004) and the exponential regression model developed in Beirlant et al. (1999) given by

$$\begin{aligned} Y_j \sim _d \left( \gamma + b_{n,k}\left( \frac{\gamma }{k+1}\right) ^{-\rho }\right) g_j, \quad j=1, \ldots , k, \end{aligned}$$

(18)

where $g_j$ are independent and identically distributed standard exponential random variables. The proposed estimator was examined through an interesting real data example where its robustness was illustrated clearly.

While developing the robust estimator, Vandewalle et al. (2004) transformed the above model into a liner form given by Eq. (3.1) of their paper, which reads

$$\begin{aligned} Y_j \sim _d \gamma + b_{n,k}\left( \frac{\gamma }{k+1}\right) ^{-\rho } + \gamma e_j, \quad j=1, \ldots , k, \end{aligned}$$

(19)

where $e_j = g_j -1$. Here comes our first little doubt by noting that the RHS of the Equations (18) and (19) are not equal; the closest form to the second that equals the first is

$$\begin{aligned} \gamma + b_{n,k}\left( \frac{\gamma }{k+1}\right) ^{-\rho }g_j + \gamma e_j. \end{aligned}$$

So, it needs to be clarified the reason of dropping $g_j$ from the second term. After assuming the linearized form (19), they have re-parametrize it as

$$\begin{aligned} Y_j = \theta _i + \theta _2t_j + \sigma e_j, \quad j=1, \ldots , k, \end{aligned}$$

(20)

where $ t_j = \left( \frac{\gamma }{k+1}\right) ^{-\rho }$, $\theta _1 = \gamma $, $\theta _2= b_{n,k}$ and $\sigma =\gamma $. Then, for the case $\gamma > 0$, they have used the robust regression method proposed by Marazzi and Yohai (2004) to estimate the parameters $(\theta _1, ~\theta _2, ~\sigma )$. This regression method has high breakdown and efficiency for usual regression set-up that they have noted for proposing the robust estimator of $\gamma $; However, the approach is computationally complicated. Moreover, under the transformed set-up (20) it is to be noted that $\theta _1 = \sigma $; this constraint needs to be taken care of while solving for the estimator numerically and may have potential effect on the properties of the resulting estimator. This needs to be examined extensively through simulation or theoretical results, that was missing in the work of Vandewalle et al. (2004). They have also noted similar limitation of the work and made a comment in the “conclusion” that they would consider this issues in their future work. Considering all this doubts, we have decided not to consider this proposal in our simulation studies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghosh, A. Divergence based robust estimation of the tail index through an exponential regression model. Stat Methods Appl 26, 181–213 (2017). https://doi.org/10.1007/s10260-016-0364-9

Download citation

Accepted: 04 July 2016
Published: 18 July 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10260-016-0364-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Divergence based robust estimation of the tail index through an exponential regression model

Abstract

Access this article

Similar content being viewed by others

Regression Estimator for the Tail Index

A nonparametric estimator for the conditional tail index of Pareto-type distributions

Tail fitting for truncated and non-truncated Pareto-type distributions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Assumptions (A1)–(A7) of Ghosh and Basu (2013) under the Assumed Exponential Regression Model

Appendix 2: Some comments on the estimator proposed by Vandewalle et al. (2004)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Divergence based robust estimation of the tail index through an exponential regression model

Abstract

Access this article

Similar content being viewed by others

Regression Estimator for the Tail Index

A nonparametric estimator for the conditional tail index of Pareto-type distributions

Tail fitting for truncated and non-truncated Pareto-type distributions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Assumptions (A1)–(A7) of Ghosh and Basu (2013) under the Assumed Exponential Regression Model

Appendix 2: Some comments on the estimator proposed by Vandewalle et al. (2004)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation