# Extreme Value Statistics

- 4.9k Downloads

## Abstract

When studying peaks in electricity demand, we may be interested in understanding the risk of a certain large level for demand being exceeded. For example, there is potential interest in finding the probability that the electricity demand of a business or household exceeds the contractual limit. An alternative, yet in principle equivalent way, involves assessment of maximal needs for electricity over a certain period of time, like a day, a week or a season within a year. This would stem from the potential interested in quantifying the largest electricity consumption for a substation, household or business.

When studying peaks in electricity demand, we may be interested in understanding the risk of a certain large level for demand being exceeded. For example, there is potential interest in finding the probability that the electricity demand of a business or household exceeds the contractual limit. An alternative, yet in principle equivalent way, involves assessment of maximal needs for electricity over a certain period of time, like a day, a week or a season within a year. This would stem from the potential interested in quantifying the largest electricity consumption for a substation, household or business. In either case, we are trying to infer about extreme traits in electricity loads for a certain region assumed fairly homogeneous with the ultimate aim of predicting the likelihood of an extreme event which might have never been observed before. While the exact truth may be not be possible to determine, it may be possible come up with an educated guess (an estimate) and ascertain confidence margins around it.

In this chapter, not only we will list and describe mainstream statistical methodology for drawing inference on extreme and rare events, but we will also endeavour to elucidate what sets the semiparametric approach apart from the classical parametric approach and how these two eventually align with one another. For details on possible approaches and related statistical methodologies we refer to [1, 2].

We hope that, through this chapter, practitioners and users of extreme value theory will be able to see how the theoretical results in Chap. 3 translate in practice and how conditions can be checked. We will be mainly concerned with semi-parametric inference for univariate extremes. This statistical methodology builds strongly on the fundamental results expounded in the previous section, most notably the theory of extended regular variation (see e.g. [3], Appendix B).

Despite the numerous approaches whereby extreme values can be statistically analysed, these are generally classed into two main frameworks: methods for maxima over fixed intervals (blocks) and methods for exceedances (peaks) over high thresholds. The former relates the oldest group of models, arising from the seminal work of [4] as block maxima models to be fitted to the largest observations collected from large samples (blocks) of identically distributed observations. The latter has often been considered the most useful framework in practical applications due to the widely proclaimed advantage over the former of a more efficient use of the often scarce extreme data.

The block maxima (BM) method, whereby observations are selected in blocks of equal length (assumed large blocks) and perform inference under the assumption that the maximum in each block (usually a year) is well approximated by the Generalised Extreme Value (GEV) distribution.

The peaks over threshold method (POT) enables to restrict attention to those observations from the sample that exceed a certain level or threshold, supposedly high, and use mainstream statistical techniques such as maximum likelihood of method of moments to estimation and hypothesis testing, under the assumption that these excesses (values by each the threshold is exceeded) follow exactly a Generalised Pareto distribution (GPD).

Application to energy smart meter data is an important part of the challenge, in the sense of the impeding application to extreme quantile estimation, i.e. to levels which are exceeded with only a small probability. A point to be wary of, when applying EVT is that, due to the wiggliness of the real world and the means by which empirical measurements are collected, observations hardly follow exactly an extreme value distribution. A recent application of extreme value theory can be found in [5]. Despite the underpinning theory to the physical models considered in this paper determines that the true probability function generating the data belongs to the Gumbel domain of attraction, the authors attempt statistical verification of this assumption with concomitant estimation of the GEV shape parameter \(\gamma \) via maximum likelihood, in a purely parametric approach. They find an estimate of \(-0.002\) which indicates that their data is bounded from above, i.e., there is a finite upper endpoint. However this is merely a point estimate whose significance must be evaluated through a test of hypothesis. In the parametric setting, the natural choice would be the likelihood ration test for the simple null hypothesis that \(\gamma =0\).

The semiparametric framework, where inference takes places in the domains of attraction rather than through of the prescribed limiting distribution to the data—either GEV or GPD depending on we set about to look at extremes in the data at our disposal—has proven a fruitful and flexible approach.

Figure 4.2 is a scatter-plot representation of the actual observations in terms of the household number. With these two plots we intend to illustrate the two stated methods for the statistical analysis of extreme values, both BM and POT. The top panel shows all the data points. While proceeding with the BM method, one would take the largest observation at each \(h=1,2, \ldots , 503\), whereby one would have available a sample of 503 independent maxima. On the other hand, applying the POT method with the selected high threshold \(t=7\) kWh, we are left with fewer data points and more importantly with several observations originating from the same household. In this case, we find the observations fairly independent only because they are one week apart. We highlight that the POT method implies that some households are naturally discarded, which we find an important caveat to the POT-method, a method that has heralding the efficient use the available extreme data.

## 4.1 Block Maxima and Peaks over Threshold Methods

*F*is in the domain of attraction of an extreme value distribution, i.e. \(F \in \mathcal D( G_\gamma )\). In order to better understand what we mean by inference in extreme domains of attraction, let us remind ourselves of the well-known condition of extended regular variation [3, 6, 7], introduced in Chap. 3, as tantamount to the domain of attraction condition. Notably, \(F\in {\mathcal {D}}(G_{\gamma })\) if and only if there exists a positive measurable function

*a*such that the pertaining tail quantile function \(U\in {\textit{ERV}}_{\gamma }\). The limit in ( 3.10) coincides with the

*U*-function of the GPD, with distribution function \(1+\log G_\gamma \), which justifies the usual inference on the excesses above a high threshold ascribed to the POT method. In fact, the extreme value condition ( 3.10) on the tail quantile function

*U*is the usual assumption in semi-parametric inference for extreme outcomes. We shall develop this aspect further in Sect. 4.3. In the next Sect. 4.2 we will start off with yet another equivalent extreme value condition to the extended regular variation of

*U*that is provided in [8] for dealing with block length and/or block number as opposed to looking at a certain of upper order statistics above a sufficiently high (random) threshold. Let

*V*be the left generalised inverse of \(-1/\log F\), i.e. \(V\bigl (-1/\log (1-t)\bigr )= F^{\leftarrow }(1-t)\). In other words, \(V(t)= F^{\leftarrow }(e^{-1/t})\), for \(0\le t<1\), which conveys standardisation to the standard Fréchet. It is straightforward to see that the df

*F*underlying the random sample \((X_1,\ldots ,X_n)\) belongs to some max-domain of attraction if and only if there exist functions

*a*and

*b*, as defined in Theorem 3.2, such that

*b*(

*t*) with

*V*(

*t*). This is where we focus next, as this factor reflects the bias stemming from absorbing

*b*(or

*V*) into the location parameter of the GEV limit distribution (see 3.2). The common understanding is that such bias is somewhat difficult to control, but we will have a closer look at this in terms of the second order refinements. The theoretical development for working out the order of convergence in Eq. (4.1) and ( 3.10) in tandem is given in Proposition 4.1. For the proof, we refer the reader to [9].

### Proposition 4.1

*U*is of extended regular variation of second order, that is, there exists a positive or negative function

*A*with \(\lim _{t\rightarrow \infty } A(t)=0\) and a non-positive parameter \(\rho \), such that for \(x>0\),

We now provide four examples of application of Proposition 4.1 alongside further details as to how the prominent GPD can, at a first glance, escape the grasp of this proposition.

### Example 4.1

**Burr**\((1, \tau , \lambda )\). This example develops along similar lines to the proof of Proposition 4.1. The Burr distribution, with d.f. \(1-(1+x^{\tau })^{-\lambda }\), \(x \ge 0\), \(\lambda , \tau >0\), provides a very flexible model which mirrors well the GEV behaviour in the limit of linearly normalised maxima, also allowing a wide scope for tweaking the order of convergence through changes in the parameter \(\lambda \). The associated tail quantile function is \(U(t)= (t^{1/\lambda }-1)^{1/\tau }, \, t \ge 1\). Upon Taylor’s expansion of

*U*, the extreme tail condition up to second order (Eq. 4.2) arises:

### Example 4.2

**Cauchy**. The relevant d.f. is \(F(x)= \frac{1}{\pi } \arctan x + \frac{1}{2}\), \(x \in \) The corresponding tail quantile function is \(U(t)= \tan \bigl (\pi /2-\pi /t \bigr )= t/\pi -\pi /3\, t^{-1} +O(t^{-3})\), as \(t\rightarrow \infty \), and admits the representation \( U(tx)-U(t) = \frac{t}{\pi } \bigl [ x-1 -\frac{\pi ^2}{3} t^{-2}(x^{-1}-1) + O(t^{-4})\bigr ]\), \(x>0\). Hence, we have that \(\gamma =1\), \(\rho =-2\) in Eq. (4.2) with auxiliary function \(a(t)= t/\pi \). Proposition (4.1) thus ascertains that (Eq. 4.3) also holds true for the Cauchy distribution where \(\gamma = 1\) and \(\tilde{\rho }= -2\).

### Example 4.3

**GPD**\((\gamma )\). The relevant d.f. is \(W_\gamma (x)= 1-(1+\gamma x)^{-1/\gamma }\), for all

*x*such that \(1+\gamma >0\). The pertaining tail quantile function is \(U(t)= (t^{\gamma }-1)/\gamma \) which is also born out of the exact tail condition ( 3.10). Clearly,

*U*does not satisfy the second order condition (Eq. 4.2) in a straightforward fashion, however we are going to show that the corresponding \(V(t)= U\bigl (1/(1-e^{-1/t})\bigr )\) satisfies (Eq. 4.3). To this end, we shall deal with the cases \(\gamma =1\) and \(\gamma \ne 1\) separately.

- Case \(\gamma =1\):
- Applying Laurent series expansion upon \((1-e^{-1/t})^{-1}\), we getas \(t\rightarrow \infty \). Whence, the second order condition (Eq. 4.3) holds with \(\gamma =1\) and \(\tilde{\rho }=-2\), where \(\widetilde{A}(t)= t^{-2}/6\) and \(\widetilde{a}(t)= t(1+\widetilde{A}(t)/\tilde{\rho })\).$$\begin{aligned} V(tx)-V(t)= \Bigl (1-\frac{1}{12t} \Bigr )(x-1) +\frac{2}{12t} H_{1,-2}(x) +O(t^{-3}), \end{aligned}$$
- Case \(\gamma \ne 1\):
- Upon Taylor’s expansion around zero, we obtainas \(t\rightarrow \infty \). Whence, the second order condition (Eq. 4.3) holds with \(\rho =-1\), where \(t\widetilde{A}(t)= (1-\gamma )/2\) and \(\widetilde{a}(t)= t^{\gamma }(1+\widetilde{A}(t)/\tilde{\rho })\).$$\begin{aligned} V(tx) -V(t)= t^{\gamma } \biggl [ \Bigl (1+\frac{\gamma -1}{2t} \Bigr )\frac{x^{\gamma }-1}{\gamma } -\frac{\gamma -1}{2t}H_{\gamma , -1}(x)\biggr ]+ O(t^{-3}), \end{aligned}$$

### Example 4.4

**Pareto**\((\alpha )\). This distribution is a particular case of the GPD d.f. in Example 4.1 with \(\gamma = 1/\alpha >0\) and \(U(t)= t^{1/\alpha }\), that is *U* does not satisfy the second order condition (Eq. 4.2) and thus Proposition 4.1 stands applicable provided similar interpretation to Example 4.1.

### Example 4.5

**Contaminated Pareto**\((\alpha )\). We now consider the Pareto distribution with a light contamination in the tail by a slowly varying function \(L(t)=(1+\log t)\), that is, \(L(tx)/L(t) \rightarrow 1\), as \(t \rightarrow \infty \), for all \(x>0\). This gives rises to the quantile function \(U(t)= t^{1/\alpha }(1 + \log t)\), with \(\alpha >0\). For the sake of simplicity, we shall use the identification \(\gamma = 1/\alpha \). With some rearrangement, we can write the spacing \(U(tx)-U(t)\) in such a way that the first and second order parameters in condition (Eq. 4.2), both \(\gamma \) and \(\rho \le 0\), crops up: \(U(tx)-U(t)= \gamma t^{\gamma } (\log t +1) \bigl [ \bigl ( 1+\frac{1}{\gamma \log t +1}\bigr ) \frac{x^{\gamma }-1}{\gamma } +\frac{1}{1+\log t} H_{\gamma ,0}(x) \bigr ]\), where \(H_{\gamma ,0}(x):= \frac{1}{\gamma } \bigl ( x^{\gamma } \log x -\frac{x^{\gamma }-1}{\gamma }\big )\). Note that we have provided an exact equality, i.e. there is no error term. We thus find that tampering with the Pareto distribution, by contaminating its tail-related values with a slowly varying factor, is just enough bring the convergence (Eq. 4.2) to a halt which is flagged-up by the lowest possible \(\rho =0\). This stalling of the Pareto distribution enables to fullfil the conditions in Proposition (4.1) thus ensuring that this contaminated Pareto distribution belongs to the max-domain of attraction of the GEV distribution with \(\gamma = 1/\alpha >0\) and \(\tilde{\rho }=0\).

## 4.2 Maximum Lq-Likelihood Estimation with the BM Method

*k*i.i.d. block maxima as

*n*into

*k*blocks of equal length (time)

*m*. For the Extreme Value theorem to hold within each block, the block length must be sufficiently large, i.e. one needs to impose

*m*tending to infinity to able to proceed with inference. We are then led to the reasonable assumption that the sample of

*k*-maxima behaves approximately as though it stems from the GEV.

*x*such that \(\sigma +\gamma (x-\mu )>0\). The density of the parametric fit to the BM framework is the GEV density, which we denote by \(g_{\theta }\), may be differ slightly from the true unknown p.d.f.

*f*underlying the sampled data. We typically estimate these constants

*a*(

*m*) and

*b*(

*m*) via maximum likelihood, despite these being absorbed into the scale \(\sigma >0\) and location \(\mu \in \) parameters of the parametric limiting distribution thus assumed fixed, eventually. As a result, BM-type estimators are not so accurate for small block sizes since these estimators must rely on blocks of reasonable length to fulfill the extreme value theorem.

*q*tends to 1, the MLq estimator approaches the usual MLE. The common understanding is that values of

*q*closer to one are preferable when we have numerous maxima drawn from large blocks since this will give enough scope for the EVT to be accessible and applicable. In practice, we often encounter limited sample sizes \(n= m\times k\) in the sense that either a small number of extremes (

*k*sample maxima) or blocks of insufficient length

*m*to contain even one extreme are available. MLq estimators have been recognised as particularly useful in dealing with small sample sizes, which is often the situation in the context of the analysis of extreme values due to the inherent scarcity of extreme events with catastrophic impact. Previous research by [10, 11] shows that the main contribution towards the relative decrease in the mean squared error stems from the variance reduction, which is the operative statement in small sample estimation. This is in contrast with the bias reduction often sought after in connection with large sample inference. Large enough samples tend to yield stable and smooth trajectories in the estimates-paths, allowing scope for bias to set in, and eventually reflecting the regularity conditions in the maximum likelihood sense. Once these asymptotic conditions are attained, the dominant component of the bias starts to emerge, and by then it can be efficiently removed. This is often implemented at the expense of an increased variance, for the usual bias/variance trade off in statistics seems never to offer anything worthwhile on one side without also providing a detriment to the other.

Figure 4.3 displays the sample paths of the adopted extreme value index estimators, plotted against several values of the distortion parameter \(q \in [0.5,1]\). As *q* increases to 1, the deformed version of both ML and MPS estimators approach their classical counterpart which stem from stipulating the natural logarithm as criterion function. The estimates for the EVI seem to consolidate between \(-0.14\) and \(-0.13\). The negative estimates obtained for all values of *q* provide evidence that the true d.f. *F* generating the data belongs to the Weibull domain of attraction and therefore we can reasonable conclude that we are in the presence of a short tail with finite right endpoint. The next section concerns the estimation of this *ultimate return-level*.

### 4.2.1 Upper Endpoint Estimation

*unknowns*in the above by their empirical analogues, yielding the estimator for the right endpoint:

*a*(

*m*) and

*V*(

*m*), and for the EVI, respectively.

Figure 4.4 displays the endpoint estimates for several values of \(q\le 1\) with respect to the tilted version of both ML and MPS estimators. The latter always finds larger estimates than the former, with a stark distance from the observed overall maximum. Since the adopted estimators do not seem to herd towards one value, it is not easy to conciliate between them. Given the maximum likelihood estimator has been widely used and extensively studied in the literature, it is sensible to ascribe preference to this estimator. Furthermore, since we are dealing with small sample sizes (we are taking the maximum over 7 weeks), the distorted version, i.e., the MLq estimator must be taken into account. Therefore, we find reasonable to take as estimate for the upper bound the midpoint of the region where the MLq changes way and travels across towards the plain ML estimator with increasing *q*. Thus, we get an upper bound of 14.0 kWh, approximately.

## 4.3 Estimating and Testing with the POT Method

In Sect. 4.1, we noticed that the function appearing in the limit of the extended regular variation of *U* matches the tail quantile function of the Generalised Pareto distribution. This fact reflects indeed the exceptional role of the GPD in the extreme value theory for exceedances [16, 17] and prompts the need for classifying of the tails of all possible distributions in \(\mathcal {D}( G_\gamma )\) into three classes in accordance to the sign of the extreme value index \(\gamma \). For positive \(\gamma \), the power-law behaviour of the underlying tail distribution function \(1-F\) has important implications one of which being the presence of infinite moments. Because when \(\gamma >0\) the first order condition ( 3.10) can be rephrased as \(\lim _{t \rightarrow \infty } U(tx)/U(t)=x^\gamma \), for all \(x>0\), that is, *U* is \(\gamma \)-regularly varying at infinity (notation: \(U \in RV_\gamma \)), then Karamata’s theorem ascertains that \(E(X_1^+)^p\) is infinite for \(p>1/\gamma \), where \(X_1^+= \max (0, X_1)\). Hence, heavy-tailed distributions in a max-domain of attraction not only have an infinite right endpoint, but also the order of finite moments is determined by order of the magnitude of the EVI \(\gamma >0\). The Fréchet domain of attraction contains distributions with polynomially decay tails such as the Pareto, Cauchy, Student’s and Fréchet itself. All d.f.’s belonging to \(\mathcal {D}(G_\gamma )\) with \(\gamma <0\)—Weibull domain of attraction—are light tailed distributions with finite right endpoint. Such domain of attraction encloses Uniform and Beta distributions. The intermediate case \(\gamma =0\) is of particular interest in many applied sciences where extremes are relevant. At a first glance, the Gumbel domain of attraction seems quite appealing for the simplicity of inference in connection to \(G_0\) that dispenses the estimation of \(\gamma \). But a closer inspection allows to appreciate the great variety of distributions possessing such an exponential tail related behaviour, whether having finite upper endpoint or not. Normal, Gamma and Lognormal distributions can all be found in Gumbel domain. The negative Fréchet distribution also belongs to the Gumbel domain, albeit with finite endpoint (cf. [18]). Therefore, a statistical test for assessing significance of the EVI would be of great use and most consequential. Looking for the most propitious type of tail before estimating tail-related features of the distribution underlying the data can mark the difference between aiming at the estimation of extreme quantile or sprinting to the estimation of the upper endpoint estimation. In fact, adopting tailored statistical methodology to the suitable domain of attraction for the underlying d.f. *F* has become a regular practice.

### 4.3.1 Selection of the Max-Domain of Attraction

A test for Gumbel domain *versus* Fréchet or Weibul max-domains has received in the literature the general designation of statistical choice of extreme domains of attraction. References in this respect are [19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29].

*k*upper order statistics from a sample of size

*n*such that \(k=k_n\) is an intermediate sequence, i.e., \(k\rightarrow \infty \) and \(k/n\rightarrow 0\) as \(n\rightarrow \infty \), define

*T*has to be conveniently replaced by

*R*or

*W*.

*T*has to be conveniently replaced by

*R*or

*W*.

### 4.3.2 Testing for a Finite Upper Endpoint

The aim of this section is to assess finiteness in the right endpoint of the actual d.f. *F* underlying the Irish smart meter data. The basic assumption is that *F* belongs to some max-domain of attraction. We then consider the usual asymptotic setting, where assume an intermediate number *k* of order statistics to drawn inference upon, that is, we take \(k=k_n \rightarrow \infty \) and \(k_n/n \rightarrow 0\), as \(n\rightarrow \infty \), and hence the corresponding random threshold \(X_{n-k,n} \rightarrow x^F\; a.s.\).

*k*of intermediate order statistics is of paramount importance to any inference problem in extremes. Many methods have been proposed, but sadly there is not universal solution that can hold for the multitude of estimators and testing procedures available. Here, we loosely follow the guideline of [23] in that the most adequate choice of the intermediate number

*k*(which carries over to the subsequent semi-parametric inference) should set on the lowest

*k*at which the critical barriers are overtaken. The ratio test (Eq. 4.9), which is known to be the most conservative of all three tests for choice of domains, does not reject the null hypothesis of the Gumbel domain since the green trajectory remains above the second horizontal line from below, for all intermediate values of

*k*considered. We have remarked that the Hasofer and Wang (HW) test defined in Eq. (4.13) is the most powerful test for detecting distributions in the Weibull domain of attraction. The application of the HW test seems to do justice to this Irish smart meter data set and finds sufficient evidence to reject the null hypothesis of the Gumbel domain in favour of entertaining estimation procedures suited to the Weibull domain. Therefore we will proceed on to the estimation of the finite upper bound in the POT framework. This will be tackled in the next section.

### 4.3.3 Upper Endpoint Estimation

*U*(

*n*/

*k*),

*a*(

*n*/

*k*) and \(\gamma \) by suitable consistent estimators, i.e.

*U*and

*a*specific to the underlying d.f.

*F*, then inference must be built on summary statistics that can capture the most distinctive traits in tail-related observations. The method of moments is ideally suited to this purpose.

*F*with its empirical counterpart \(F_n\) and

*t*by the order statistic \(X_{n-k,n}\) with \(k<n\). This led to the statistic

like Pickands and Moment estimators, it is valid for any \(\gamma \in \mathbb {R}\) and, contrary to the maximum likelihood estimator, it has a simple explicit functional form;

it is very close to the maximum likelihood estimator for \(\gamma \ge 0\);

If \(\gamma \le 0\), the asymptotic variance is the same as of the Moment estimator; if \(\gamma >0\), its asymptotic variance is equal to that of the maximum likelihood estimator;

A shift invariant version with similar properties is available with the same asymptotic variance without sacrificing the dominant component of the bias, which is never increased as long as we keep a suitable number

*k*;there are accompanying shift and scale estimators that make e.g. high quantile and upper endpoint estimation straightforward.

Figure 4.6 shows the sample paths of several estimators of the EVI as the upper number *k* of order statistics embedded in the estimators increases, concomitantly lowering the threshold until the value 5 khW is reached. The standard practice for drawing meaningful conclusions from this type of plots is by eyeballing the trajectories and seek for a plateau of stability at the confluence of the adopted estimators. In the top panel of Fig. 4.6, the MLq estimator of the extreme value index \(\gamma \), which has no explicit closed form and thus delivers estimates numerically, experiences convergence issues. This is often the case with maximum likelihood estimation for the GPD when the true value is negative but close to zero.

In the semi-parametric setting, whilst working in the domain of attraction rather than dealing with the limiting distribution itself, the upper intermediate order statistic \(X_{n-k,n}\) plays the role of the high deterministic threshold \(u\uparrow x^F\le \infty \) above which the parametric fit to the GPD is applicable. For the asymptotic properties of the POT maximum likelihood estimator of the EVI under a semi-parametric approach, see e.g. [34, 35, 36]. Although theoretically well determined, even when \(\gamma \uparrow 0\), the non-convergence to a ML-solution can be an issue when \(\gamma \) is close to zero. There are also irregular cases which may compromise the practical applicability of ML. Theoretical and numerical accounts of these issues can be found in [37, 38] and references therein.

In the second panel in Fig. 4.6, we swap the MLq estimator with the MPS (or MSP) estimator for the GPD. Although there are issues in the numerical convergence for small values of *k*, where the variance is stronger, this estimator shows enhanced behavior returning estimates of the EVI in agreement with the remainder estimators. Therefore, it seems reasonable to settle with the point estimate \(\hat{\gamma }=-0.01\). It is worth highlighting that the MLq shows its best performance within the corresponding region of values of, that is, for *k* between 125 and 175, a region that also holds feasible for the tests expounded in Sect. 4.3.1.

Figure 4.7 displays the estimate-yields for several endpoint estimators in the class (Eq. 4.17), with accompanying general endpoint estimator. Again, the corresponding maximum Lq-likelihood estimator is found with the distortion parameter *q* being set equal to 0.88, where the *k*-values for which the Lq-likelihood method experienced convergence issues in the estimation of the EVI are now omitted. The value \(q=1\) determines the mainstream ML estimator for the endpoint which the class of estimators defined in Eq. (4.17) also encompasses. The relative finite sample performance of these endpoint estimators is here compared with the naïve maximum estimator \(X_{n,n}\). We recall that the observed maximum is 12.01. The general endpoint estimator consistently returns values around 12.4 for almost all values of *k*. All the other estimators, expect the MLq estimator for the upper endpoint seem to exacerbate the upper bound for the electricity load. Therefore, we find reasonable to advise that the estimate for the upper endpoint of \(\hat{x}^F=13.0\) kWh should be taken as benchmark for future work concerning technologies to enable end use energy demand. We also issue the cautionary remark that this is a mere point estimate, disposed of any confidence bounds.

## 4.4 Non-identically Distributed Observations—Scedasis Function

Social demographics and behavioural changes are not likely to occur within the time span considered to this illustrative example, but we can see that in other applications to electricity consumption even a decade is enough time for human behaviour to be vastly different. Regulation of the gas and electricity industry, introduction of software applications that can monitor and incentivise certain consumption over others, time-of-use tariffs, new low carbon technologies and the variety of electronic devices in homes will change the way consumers interact with the grid and consume their electricity and most probably has already changed.

Nonetheless we still want to be able to address the same questions as we did before and we want to ensure that there is a probabilistic framework that grounds our statistical analysis. Our main concern lies with observations that are not identically distributed but we will include a short review for data that exhibit serial dependence.

There are of course many ways to look at dependence in data sets indexed in time or space. Ways of pre-processing data to alleviate dependence issues and possible non-stationary have been considered in [39]. Historically, however, simpler clustering techniques have been employed [40]. We already discussed apropos to the BM, to choose the length of block in such a way that the dependence no longer plays a role or is weak for the extreme value theorem to hold. Ensuring that our observation come from separate events is a simple way of ascertaining independence and that the sampled data contains meaningful information on the widely different phenomena emanating from similar hazards (earthquakes, rainfall, storms). For example, if we were considering heatwaves, they may last up to a fortnight thus considering daily maxima in temperature taken during a heatwave will be part of the same weather system and subsequently dependent and also less representative of the wild. Similarly, if we are considering rainfall, a low pressure system may last two or three days thus maxima taken every 2–3 non-overlapping days may be considered to independent or only weakly dependent. In the same vein, we may look at weekly maxima of the electric load profiles of individual household to weed out the daily and sub-weekly patterns. Thus, the block to be chosen should be application and data specific.

However, sometimes there is also temporal dependence in addition to individual events i.e. profiles change with time. For temperature data, the change comes with the season as well due to the diurnal cycle. Similarly, for electricity data there are many sources of seasonality to consider: the impact from temperature i.e., an annual cycle and the daily cycle as well as human behaviour which may repeat weekly. For example, we may restrict to only taking weekly maxima from the summer, etc. This is where we turn focus to non-stationarity extremes, meaning that the underlying distribution changes over time or across space or both. This aspect will be exploited through the definition of a trend in the frequency of extremes in such a way to maintain integrity as we move across the potentially different distribution functions ascribed to each of the considered time-intervals (or spatial-regions), assumed homogeneous within themselves and heterogeneous between them. The basic structural assumption to the trend on the time-space evolving probability that a certain large outcome is exceeded originates from the concept of comparable tails [41]. There have been estimators accounting for the heteroscedastic (non-stationarity in the scale) setting, first introduced by [42] and further developed by [43] to address challenged arising in the modelling of heavy-tails.

*n*time points, which are assumed to follow different distribution functions \(F_{n,1}, \ldots , F_{n,n}\) but sharing a common upper endpoint denoted by \(x^F\). Suppose the following limit relation involving an offset or baseline distribution function

*F*and a continuous positive function

*c*taking values in [0, 1],

*c*to represent the frequency of extremes in the tail. Einmahl et al. [43] advocate a kernel density estimator as ideally suited to tackle the estimation of the scedasis function which can be viewed as a density in itself. Specifically, the estimator for

*c*(

*s*) is given by

*G*is a continuous, symmetric kernel such that \(\int _{-1}^{1} G(s)ds = 1\), with \(G(s) = 0\) for \(|s|>1\). The bandwidth, \(h := h_n\) satisfies \(h \rightarrow 0\) and \(kh \rightarrow \infty \) as \(n \rightarrow \infty \), and \(I_{A} = 1\) denotes indicator function which is equal to 1 if

*A*holds true and equal to 0 otherwise. Finally we note that \(X_{n,n-k}\) is the global random threshold determined by the \(k{\text {th}}\) largest observation. We shall defer the estimation of the scedasis in practice to Chap. 5 using the Thames Valley Vision data.

## References

- 1.Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J.L.: Statistics of Extremes: Theory and Applications. Wiley (2004)Google Scholar
- 2.Castillo, E., Hadi, A., Balakrishnan, N., Sarabia, J.M.: Extreme Value and Related Models with Applications in Engineering and Science. Wiley, Hoboken, New Jersey (2005)zbMATHGoogle Scholar
- 3.de Haan, L,. Ferreira, A.: Extreme Value Theory: An Introduction. Springer (2006)Google Scholar
- 4.Gumbel, E.J.: Statistics of Extremes. Columbia University Press, New York and London (1958b)CrossRefGoogle Scholar
- 5.Melinda Gálfi, V., Bódai, T., Lucarini, V.: Convergence of extreme value statistics in a two-layer quasi-geostrophic atmospheric model. Complexity (2017)Google Scholar
- 6.Bingham, N., Goldie, C., Teugels, J.: Regular Variation. Cambridge University Press (1987)Google Scholar
- 7.de Haan, L.: On regular variation and its application to the weak convergence of sample extremes. Ph.D. thesis, Mathematisch Centrum Amsterdam (1970)Google Scholar
- 8.Ferreira, A., de Haan, L.: On the block maxima method in extreme value theory: PWM estimators. Ann. Statist.
**43**(1), 276–298 (2015)MathSciNetCrossRefGoogle Scholar - 9.Jeffree, C., Neves, C.: Tilting maximum Lq-likelihood estimation for extreme values drawing on block maxima. Technical report (2018). arXiv:1810.03319
- 10.Ferrari, D., Yang, Y.: Maximum Lq-likelihood estimation. Ann. Statist.
**38**(2), 753–783 (2010)MathSciNetCrossRefGoogle Scholar - 11.Ferrari, D., Paterlini, S.: The maximum Lq-likelihood method: an application to extreme quantile estimation in finance. Methodol. Comput. Appl. Probab.
**11**, 3–19 (2009)MathSciNetCrossRefGoogle Scholar - 12.Cheng, R.C.H., Amin, N.A.K.: Estimating parameters in continuous univariate distributions with a shifted origin. J. Roy. Statist. Soc. Ser. B
**45**, 394–403 (1983)MathSciNetzbMATHGoogle Scholar - 13.Ranneby, B.: The maximum spacing method. An estimation method related to the maximum likelihood method. Scand. J. Statist.
**11**, 93–112 (1984)MathSciNetzbMATHGoogle Scholar - 14.Ekström, M.: Consistency of generalized maximum spacing estimates. Scand. J. Statist.
**28**(2), 343–354 (2001)MathSciNetCrossRefGoogle Scholar - 15.Huang, C., Lin, J.-G.: Modified maximum spacings estimator for generalized extreme value distribution and applications in real data analysis. Metrika
**77**, 867–894 (2013)CrossRefGoogle Scholar - 16.Balkema, A.A., de Haan, L.: Residual life time at great age. Ann. Prob. 792–804 (1974)Google Scholar
- 17.Pickands, J.: Statistical inference using extreme order statistics. Ann. Stat. 119–131 (1975)Google Scholar
- 18.Fraga Alves, I., Neves, C.: Estimation of the finite right endpoint in the Gumbel domain. Statistica Sinica
**24**, 1811–1835 (2014)Google Scholar - 19.Galambos, J.: A statistical test for extreme value distributions. In: Gnedenko, B.V. et al. (eds.) Non-parametric Statistical Inference, North Holland, Amesterdam, pp. 221–230 (1982)Google Scholar
- 20.Castillo, E., Galambos, J., Sarabia, J.M.: The selection of the domain of attraction of an extreme value distribution from a set of data. In: Hüsler, J., Reiss, R.-D. (eds.) Extreme Value Theory. Lecture Notes in Statistics, vol. 51, pp. 181–190 (1989)Google Scholar
- 21.Hasofer, A.M., Wang, Z.: A test for extreme value domain of attraction. J. Am. Stat. Assoc.
**87**, 171–177 (1992)CrossRefGoogle Scholar - 22.Fraga Alves, M.I., Gomes, M.I.: Statistical choice of extreme value domains of attraction—a comparative analysis. Commun. Stat. Theory Methods
**25**(4), 789–811 (1927)MathSciNetCrossRefGoogle Scholar - 23.Wang, J.Z., Cooke, P., Li, S.: Determination of domains of attraction based on a sequence of maxima. Austr. J. Stat.
**38**(2), 173–181 (1996)Google Scholar - 24.Marohn, F.: An adaptive test for Gumbel domain of attraction. Scand. J. Stat.
**25**(2), 311–324 (1998a)MathSciNetCrossRefGoogle Scholar - 25.Marohn, F.: Testing the gumbel hypothesis via the pot-method. Extremes
**1**(2), 191–213 (1998b)MathSciNetCrossRefGoogle Scholar - 26.Segers, J., Teugels, J.: Testing the gumbel hypothesis by Galton’s ratio. Extremes
**3**(3), 291–303 (2000)MathSciNetCrossRefGoogle Scholar - 27.Neves, C., Picek, J., Alves, M.F.: The contribution of the maximum to the sum of excesses for testing max-domains of attraction. J. Stat. Plan. Inference
**136**(4), 1281–1301 (2006)MathSciNetCrossRefGoogle Scholar - 28.Neves, C., Fraga Alves, M.I.: Semi-parametric approach to the hasofer-wang and greenwood statistics in extremes. Test
**16**(2), 297–313 (2007)MathSciNetCrossRefGoogle Scholar - 29.Fraga Alves, I., Neves, C., Rosário, P.: A general estimator for the right endpoint with an application to supercentenarian women’s records. Extremes
**20**(1), 199–237 (2017)MathSciNetCrossRefGoogle Scholar - 30.Neves, C., Pereira, A.: Detecting finiteness in the right endpoint of light-tailed distributions. Stat. Probab. Lett.
**80**(5), 437–444 (2010)MathSciNetCrossRefGoogle Scholar - 31.Smith, R.L.: Estimating tails of probability distributions. Ann. Stat.
**15**, 1174–1207 (1987)MathSciNetCrossRefGoogle Scholar - 32.Dekkers, A.L.M., Einmahl, J.H.J., de Haan, L.: A moment estimator for the index of an extreme-value distribution. Ann. Stat.
**17**, 1833–1855 (1989)MathSciNetCrossRefGoogle Scholar - 33.Fraga Alves, M.I., Gomes, M.I., de Haan, L., Neves, C.: Mixed moment estimator and location invariant alternatives. Extremes
**12**(2), 149–185 (2009)MathSciNetCrossRefGoogle Scholar - 34.Drees, H., Ferreira, A., de Haan, L.: On maximum likelihood estimation of the extreme value index. Ann. Appl. Probab.
**14**(3), 1179–1201 (2004)MathSciNetCrossRefGoogle Scholar - 35.Qi, Y., Peng, L.: Maximum likelihood estimation of extreme value index for irregular cases. J. Stat. Plan. Inference
**139**, 3361–3376 (2009)MathSciNetCrossRefGoogle Scholar - 36.Zhou, C.: The extent of the maximum likelihood estimator for the extreme value index. J. Multivar. Anal.
**101**(4), 971–983 (2010)MathSciNetCrossRefGoogle Scholar - 37.Castillo, J., Daoudi, J.: Estimation of generalized Pareto distribution. Stat. Probabiliy Lett.
**79**, 684–688 (2009)MathSciNetCrossRefGoogle Scholar - 38.Castillo, J., Serra, I.: Likelihood inference for generalized Pareto distribution. Comput. Stat. Data Anal.
**83**, 116–128 (2015)MathSciNetCrossRefGoogle Scholar - 39.Eastoe, E.F., Tawn, J.A.: Modelling non-stationary extremes with application to surface level ozone. J. R. Stat. Soc.: Ser. C (Appl. Stat.)
**58**(1), 25–45 (2009)MathSciNetCrossRefGoogle Scholar - 40.Coles, S., Bawa, J., Trenner, L., Dorazio, P.: An Introduction to Statistical Modeling of Extreme Values, vol. 208. Springer (2001)Google Scholar
- 41.Resnick, S.I.: Tail equivalence and its applications. J. Appl. Prob.
**8**, 135–156 (1971)MathSciNetCrossRefGoogle Scholar - 42.de Haan, L., Tank, A.K., Neves, C.: On tail trend detection: modeling relative risk. Extremes
**18**(2), 141–178 (2015)MathSciNetCrossRefGoogle Scholar - 43.Einmahl, J.H., Haan, L., Zhou, C.: Statistics of heteroscedastic extremes. J. R. Stat. Soc.: Ser. B (Stat. Methodol.)
**78**(1), 31–51 (2016)MathSciNetCrossRefGoogle Scholar

## Copyright information

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.