Extreme Value Theory

The main topics covered in this chapter are:what extreme value theory is and how it differs from classical statistics;the two pillars of extreme value theory: Fisher–Tippett–Gnedenko theorem and Pickands–Balkema–de Haan theorem;the three classes that the limit distribution of maxima will fall into: the Frechet, Weibull, or Gumbel distribution;the generalized Pareto distribution;the maximum domain of attraction of an extreme value distribution and the concept of tail equivalence;the theory of maxima for stationary processes;extreme value theory for multivariate distributions;the role of copula in multivariate extreme value theory;the three types of copulas;three estimation methods for distributions: maximum likelihood estimation method, method of moments, and special estimators;the Hill estimator and the Pickands estimator for estimating the shape parameter of a distribution;use and limitations of the quantile plot (QQ-plot) for verifying statistical hypotheses by examining the degree of deviations of the linearity plot of a hypothesized distribution;three different approaches to compute widely-known risk measures (VaR and AVaR).

From travel disruptions to natural disasters, extreme events have long captured the public's imagination and attention. Due to their rarity and often associated calamity, they make waves in the news ( Fig. 3.1) and stir discussion in the public realm: is it a freak event? Events of this sort may be shrouded in mystery for the general public, but a particular branch of probability theory, notably Extreme Value Theory (EVT), offers insight to their inherent scarcity and stark magnitude. EVT is a wonderfully rich and versatile theory which has already been adopted by a wide variety of disciplines in a plentiful way. From its humble beginnings in reliability engineering and hydrology, it has now expanded much further; it can be used to model the occurrences of records (say for example in athletic events) or quantify the probability of floods with magnitude greater than what has been observed in the past, i.e it allows us extrapolate beyond the range of available data! In this book, we are interested in what EVT can tell us about electricity consumption of individual households. We already know a lot about what regions and countries do on average but not enough about what happens at the substation level or at least not with enough accuracy. We want to consider "worst" case scenario such as an area-wide blackout or the "very bad" case scenario such as a circuit fuse blowout or a low-voltage event. Distribution System Operators (DSO) may want to know how much electricity they will need to make available for the busiest time of day up to two weeks in advance. Local councils or policy makers may want to decide if a particular substation is equipped to meet the demands of the residents and if it needs an upgrade or maintenance. EVT can definitely help us to answer some of these questions and perhaps even more as we develop and adapt the theory and approaches further.
There are many ways to infer properties about a population based on various sample statistics. Depending on the statistic, a theory about how well it estimates the parameter of interest can be developed. The sample average is one such, very common, statistic. Together with the law of large numbers and subsequently the central limit theorem (as well as others), a well known framework exists. However, this framework is lacking in various ways. Some of these limitations are linked to the assumptions of the normal distribution of finite mean and variance. But what if the underlying distribution does not have finite variance or indeed even a finite mean? Not only this, the processes involved in generating a "typical" event may be different to the processes involved in generating an extreme event, e.g. the difference between a windy day and a storm event. Or perhaps extreme events may come from different distributions: for example, internet protocols are the set of rules that set standards on data being transmitted using the internet (or another network). Faloutsos et al. [1] concluded that power-laws can help analyse the average behaviour network protocols whereas simulations from [2] showed exponential distribution in the tails.
EVT establishes the probabilistic and statistical theory of a different sample statistic: unsurprisingly, of extremes. Even though the study of EVT did not gain much traction before [3], some fundamental studies had already emerged in the earlier part of the twentieth century. While not the earliest analysis of extremes, the development of the general theory started with the work of [4]. The paper concerns itself with the distribution of the range of random samples drawn from the normal distribution. It was this work that officially introduced the concept of the distribution of the largest value. In the following years, [5,6] evaluated the expected value and median of such distributions and the latter extended the question to non-normal distributions. The work of [7][8][9] gave the asymptotic distributions for the sample extremes. These works summatively give us the extreme value theorem and analogues of the central limit theory for partial or sample maxima. As this is one of the most fundamental results of EVT, we will explore it in more detail in Sect. 3.2.
In essence both the central limit theorem and the extreme value theorem are concerned with describing the same thing; an unusual event. The event may occur as a result from an accumulation of many events or from a single event which exceeds some critical threshold (or not), studied by [10]. Consider, a river whose water levels fluctuate seasonally. These fluctuations may erode its bank overtime or a single flash flooding may break the riverbank entirely. The first is a result of a cumulative effect with which the central limit theorem is concerned whereas the latter is the result of an event which exceeded what the riverbank could withstand, i.e. an extreme event, with which extreme value theory is concerned.
Analogous to measuring "normal" behaviour using a mean, median or a mode, "extreme" behaviour may also be defined in multiple ways. Each definition will give rise to specific results which may be linked to, but different from, results derived from other definitions. However, these results complement each other and allow application to different scenarios/disciplines depending on the nature of the data and the question posed. The subject of EVT is concerned with extrapolation beyond the range of the sample measurements. Hence, it is an asymptotic theory by nature, i.e. results tell us what happens when sample size tends to infinity.
The overarching goal of any asymptotic theory is two fold: 1. to provide the necessary and sufficient conditions to ensure that a specific distribution function (d.f.) occurs in the limit, which are rather qualitative conditions, and 2. to find all the possible distributions that may occur in the limit and derive a generalised form for those distributions.
The first goal is known as the domains of attraction problem whereas the second is known as the limit problem. Before we take a look at the asymptotic theory, it is valuable to review some of the statistical background and terminology that will be prevalent throughout this report. The rest of this chapter is dedicated to an in depth exploration of extreme value theory, particularly the probabilistic foundations for the methods of inference presented in Chap. 4 and ensuing application in Chap. 5. In the following section, we will introduce and clarify some of the central terminology and nomenclature. In Sect. 3.2, we will explore the fundamental extreme value theorem which gives rise to the generalised form to the limiting distribution of sample maxima and other relevant results. We will then consider results for the Peaks over threshold (POT) approach in Sect. 3.3. In Sect. 3.4, some theory on regularly varying functions is discussed. The theory of regular variation is quite central to EVT though often it operates from the background. Familiarity with regular variation is highly recommended for the readers interested in a mathematical and theoretical treatment of EVT. Finally, in Sect. 4.4, we consider the case where condition of identically distributed rv's can be relaxed. In Chaps. 3 and 4 as a whole, we aim to provide intuitive understanding of the theoretical results, to expound reasons for these being in great demand to many applications and to illustrate them with some examples of challenges arising in the energy sector.

Basic Definitions
As was mentioned earlier, various definitions of "extremes" give rise to different, but complementary, limiting distributions. In this section, we want to formalise what we mean by "extreme" as well as introduce the terminology that will be prevalent throughout the book.
Suppose X 1 , X 2 , . . . , X n , . . . is a sequence of independent random variables (r.v's). Throughout this book and until said otherwise we shall assume that all these rv's are generated by the same physical process and therefore it is reasonable to assume that any sample (X 1 , X 2 , . . . , X n ), made out of (usually the first) n random variables in the sequence, is a random sample of independent and identically distributed (i.i.d) random variables with common distribution function (d.f.) F(x) := P(X ≤ x), for all x ∈ R. The non-stationary case, where the underlying d.f. function is allowed to vary with the time or location i of X i has been worked through in the extreme values context by [11]. We shall refrain to delve into detail on this case within this chapter but in Chap. 4 we shall refer to the statistical methodology for inference on extreme that has sprang from the domain of attraction characterisation furnished by [11]. We shall assume the underlying df is continuous with probability density function (PDF) f . We also often talk about the support of X; this is the set of all values of x for which the pdf is strictly positive. The lower endpoint of the support of F or the left endpoint is denoted by x F i.e., Equivalently, we define the upper (or right) endpoint of F by which can be either finite or infinite. When each of these values exist finite, these are probabilistically speaking the smallest and largest values that can ever be observed, respectively. In reality we are not likely to observe such extremes or if we do observe extremes we do not know how close they are to the endpoints. This is only to be aggravated by what is generally known in many applications, such as financial or actuarial mathematics, that there might no such thing as a finite upper bound. The main broad aim of EVT is centred around this idea. Its purpose is to enable estimation of tale-telling features of extreme events, right up to the level of that unprecedented extreme event so unlikely to occur that we do not expect it to crop up in the data. Until they do... Now that we have established that EVT sets about teetering on the bring of the sample, aiming to extrapolation beyond the range of the observed data, the sample maximum inherently features as the relevant summary statistic we will be interested in characterising. Preferably with a fully-fledged and complete characterisation, but flexible enough to be taken up by wider applied sciences. A probabilistic result pertaining to the sample maximum such that, for its simplicity and mild assumptions could serve as a gateway for practitioners to find bespoke tail-related models that were not previously accessible or easily interpreted; a theorem like this would be dramatically useful. It turns out that such a result, with resonance often likened to the Central Limit Theorem, already exists. This theorem, the Extreme Value or Extremal Types theorem is the centrepiece to the next section. Before get underway to the asymptotic (or limit) theory of extremes, we need to familiarise ourselves with the following concepts of convergence for sequences: • A sequence of random variables X 1 , X 2 , . . . is said to converge in distribution to a random variable X [notation: This is also known as weak convergence.
• The sequence converges in probability to X [notation: X n P → X ] if, for any > 0, • The sequence converges almost surely to X [notation: X n a.s.
Almost sure convergence is also referred to as strong convergence.

Maximum of a Random Sample
In the classical large sample (central limit) theory, we are interested in finding the limit distribution of linearly normalised partial sums, S n := X 1 + · · · + X n , where X 1 , X 2 , . . . , X n . . . are i.i.d. random variables. Whilst the focus here is on the aggregation or accumulation of many observable events, non of these being dominant, the Extreme Value theory shifts to edge of the sample where, for its huge magnitude and potentially catastrophic impact, one single event dominate the aggregation of the data. In the long run, the maximum might not be any less than the sum. Although this seems a bold claim, its probabilistic meaning will become more glaringly apparent later on when we introduce the concept of heavy tails.
The Central Limit theorem entails that the partial sum S n = X 1 + X 2 + . . . + X n of linearly normalised random variables, with constants a n > 0 and b n , drawn from an iid sequence is asymptotically normal, i.e.
reflecting Charlie Winsor's principle that "All actual distributions are Gaussian in the middle". The suitable choice of constant for the Central Limit theorem (CLT) to hold in its classical form are b n = n E(X 1 ) and a n = √ nV ar(X 1 ). Therefore, an important requirement is the existence of finite moment of second order, i.e. E|X 1 | 2 < ∞. This renders the CLT inapplicable to a number of important distributions such as the Cauchy distribution. We refer to [12,Chap. 1] for further aspects on the class of sub-exponential distributions, which includes but is far from limited to the Cauchy distribution.
As it was originally developed, the Extreme Value theorem is concerned with partial maxima X n,n := max(X 1 , X 2 , . . . , X n ) of an iid (or weakly dependent) sequence of rv's. Thus, the related extremal limit problem is to find out if there exist constants a n > 0 and b n such that the limit distribution to a −1 n (X n,n − b n ) is non-degenerate. It is worth highlighting that the sample maximum itself converges almost surely to the upper endpoint of the underlying distribution F to the sampled data, for the df of the maximum is given by P(X n,n ≤ x) = F n (x) → 1 {x≥x F } , as n → ∞, with x F ≤ ∞ and {X n,n } n≥1 recognisably a non-decreasing sequence. Here, the indicator function denoted by 1 A is equal to 1 if A holds true and is 0 otherwise, meaning the probability distribution for the maximum distribution has mass confined to the upper endpoint. Hence, the matter now fundamentally lies in answering the following questions: (i) Is is possible to find a n > 0 and b n such that for all x continuity points of a non-degenerate cdf G? (ii) What kind of cdf G can be attained in the limit? (iii) How can be G be specified in terms of F? (iv) What are suitable choices for constants a n and b n question (i) is referring to? Each of these questions a addresses with great detail in the excellent book by [13]. The celebrated Extreme Value theorem gives us the only three possible distributions that G can be. The extreme value theorem (with contributions from [3,8,14]) and its counterpart for exceedances above a threshold [15] ascertain that inference about rare events can be drawn on the larger (or lower) observations in the sample. The precise statement is provided in the next theorem. Corresponding result for minima is readily accessible by using the device X 1,n = − max(−X 1 , −X 2 , . . . , −X n ).
variables with the same continuous df F. If there exist constants a n > 0 and b n ∈ R, and some non-degenerate d.f. G such that Eq. (3.2) holds, then G must be only one of three types of d.f.'s: The Fréchet, Gumbel and Weibull d.f.'s can be in turn nested in a one-parameter family of distribution through the von Mises-Jenkinson parameterisation [16,17]. Notably, the Generalised Extreme Value (GEV) distribution with df given by The parameter γ ∈, the so-called extreme value index (EVI), governs the shape of the GEV distribution. In the literature, the EVI is also referred to as the shape parameter of the GEV. We will explore this notion more fully after establishing what it means for F to be in the maximum domain of attraction of a GEV distribution.
if it is possible to redefine constants a n > 0 and b n provided in Eq. (3.2) in such a way that, lim with G γ given by Eq. (3.6).
We now describe briefly the most salient features in a distribution belonging to each of the maximum domains of attraction: 1. If γ > 0, then the df F underlying the random sample (X 1 , X 2 , . . . , X n ) is said to be in the domain of attraction of a Fréchet distribution with d.f. given by 1/γ . This domain encloses all df's that are heavy-tailed. Qualitatively, this means that the probability of extreme events are ultimately non-negligible, and that upper endpoint x F is infinite. Moreover, moments E |X | k of order k > 1/γ are not finite [18]. Formally, heavy-tailed distributions are those whose tail probability, 1 − F(x), is larger than that of an exponential distribution. Thus, noting that  [20]. 3. Lastly, for γ < 0, F is said to be in the domain of attraction of the Weibull distribution with d.f. −1/γ . This domain contains short-tailed distributions with finite right endpoint. The case study presented in this report, that of electricity load of individual households, most likely belongs to the Weibull domain of attraction. This is because there is both a contractual obligation for customers to limit their electricity consumption and also physical limit to how much the fuse box can take before it short circuits. Now that we have tackled the extremal limit problem, we can start to dissect the domains of attraction. There are enough results on this to publish multiple chapters however not all results are pertinent to our case study. Thus, we take a smaller set of results for the sake of brevity and comprehension. We present the first theorem (Theorem 3.2) which presents a set of equivalent conditions for F to belong to some domain of attraction condition. The proof for this (as for most results) are omitted but can be found in [18] (see Theorem 1.1.6). Theorem 3.2 allows us to make several important observations and connections. As noted before, we have two ways now to see that F ∈ D(G γ ), one in terms of the tail distribution function 1 − F and the other in terms of the tail quantile function U defined as We note that the upper endpoint can thus be viewed as the ultimate quantile in the sense that x F = U (∞) := lim t→∞ U (t) ≤ ∞. Secondly, we have possible forms for b n and some indication as to what a n might be.

Theorem 3.2.
For γ ∈ R, the following statements are equivalent: 1. There exists real constants a n > 0 and b n ∈ R such that

9)
for all x with 1 + γx > 0. 2. There is a positive function a such that for x > 0,

10)
where for γ = 0, the right hand side is interpreted at continuity, i.e. taking the limit as γ → 0 giving rise to log x. We use the notation U ∈ ERV γ . 3. There is a positive function a such that
We see in the theorem above where theory of regular variation may come in with regard to U . Though we have deferred the discussion of regular variation to Sect. 3.4, it is useful to note that the tail quantile function U is of extended regular variation (cf. Definition 3.6) and only if F belongs to the domain of attraction of some extreme value distribution. Extreme value conditions of this quantitative nature have resonance from a rich theory and toolbox that we can borrow and apply readily, making asymptotic analysis much more elegant. We can see what the normalising constants might be and we see that we can prove that particular d.f. belongs to the domain of attraction of a generalised extreme value distribution either using F or using U . We may look at another theorem which links the index of regular variations with the extreme value index. Proceeding along these lines, the next theorem borrows terminology and insight from the theory of regular variation; though defined later (Definition 3.4), a slowly varying function satisfies lim t → ∞ (t x)/ (x) = 1. Theorem 3.3 gives us the tail distribution of F, denoted byF = 1 − F in terms of a regularly varying function and the EVI. Noticing thatF is a regularly varying function means we can integrate it using Karamata's theorem (Theorem 3.13) which is useful for formulating functions f satisfying Eq. (3.12).
for all x > 0. 2. F is in the Weibull domain of attraction, i.e. F ∈ D(G γ ) for γ < 0, if and only ifF for all x ∈ R, with f a suitable positive auxiliary function. If the above equation holds for some f , then it also holds with f (t) := where the numerator of the integral exists finite for t < x F .

Theorem 3.4. F ∈ D(G γ ) if and only if for some positive function f ,
for all x with 1 + γx > 0. If the above holds for some f , then it also holds with Moreover, any f that satisfies Eq. (3.13) also satisfies In this section, we have thus far mentioned a suitable function f which plays various roles however it should not be interpreted as probability density function of F, unless explicitly stated as such. Theorem 3.4 gives us alternative forms for f and its limit relations.

(3).
We briefly consider maxima that have not been normalised and have been normalised for various sample sizes n = 1, 7, 30, 365, 3650 (Fig. 3.3). The left plot of Fig. 3.3 shows the distribution of the maxima where the lines represent, from left to right, each of the sample sizes. The right hand side of the same plot shows how quickly, the normalised maxima go to the Gumbel distribution; the exponential distribution belongs to the Gumbel domain of attraction. The appropriate normalising constants for F standard exponential are a n = 1 and b n = log(n). Deriving this is left as an exercise. Doing the same thing except now with F standard normal distribution, Fig. 3.4 shows that the convergence is slow. In this case, a n = (log n) −1/2 and b n = (log n) 1/2 − 0.5 (log n) −1/2 (log log n + log 4π). As before, F standard normal belongs to the Gumbel domain of attraction.
Theorem 3.5 gives which sequences a n and b n should be used to normalise maxima in order to ensure that F is in the maximum domain of attraction of a specific G γ . Note that the values of a n and b n changes with the sample size, n. If γ were known before hand, knowing the true value of the normalising constants may help with simulations or numerical experiments. However in practice, we do not know γ and it must be estimated. Thus, we can use the von Mises condition give us a work around.

Theorem 3.6 (von Mises Condition). Let r (x) be the hazard function defined by
where f (x) is the probability density function and F is the corresponding d.f..

If r (x) is ultimately positive in the negative neighbourhood of x F , is differentiable there and satisfies lim
The von Mises conditions given in Theorem 3.6 is particularly useful when one is interested in conducting simulations. We may sample from a known distribution F which readily gives us the probability density function, f . Thus, without knowledge of the appropriate normalising constants, the von Mises conditions allow us to find the domain of attraction of F.
We have discussed the asymptotic theory of the maximum from a sample. Earlier we mentioned that in practice, we divide the data into blocks of length n and take the maximum from each block and conduct inference on them. The results we have discussed in this section, tell us what happens as the block size becomes infinitely large. The approach of sampling maxima from blocks is, unsurpsingly known as the Block Maxima approach. As [21] pointed out, the block maxima model offers many practical advantages (over the Peaks Over Threshold, Sect. 3.3). The block maxima method is the appropriate statistical model when only the most extreme data are available; for example, historically temperature data was recorded in daily minimum, average and maximum. In cases where the time series may have periodicity, we can remove some dependence by dividing the blocks in such a way that dependence may exist within the block but not between blocks. We will now consider an alternative but equivalent method.

Exceedances and Order Statistics
When conducting inference on the tail of a distribution, it is wasteful to consider only the most extreme observation. We may be able to glean valuable information by utilising more than just the maximum. For such cases, we may study either exceedances over a (high) threshold (Sect. 3.3.1) or we may consider order statistics (Sect. 3.3.2). In each case, we get different limiting distributions. In what follows we will discuss what the limiting distributions are in each case and how they relate to the extreme value distributions and the results from Sect. 3.2.

Exceedances
In this instance, the idea is that statistical inference is be based on observations that exceed a high threshold, u, i.e., either on X i or on X i − u provided that X i > u for i ≤ n. The exact conditions under which the POT model hold is justified by second order refinements (cf. Sect. 3.4) whereas typically it has been taken for granted that the block maxima method follows the extreme value distribution very well. We saw this from the discussion from Fig. 3.4. For large sample sizes, the performance of the block maxima method and the peaks over threshold method is comparable. However, when the sample is not large, there may be some estimations where the Peaks over threshold (POT) model is more efficient [21].
Since we have familiarised ourself with the convergence of partial maxima, we now do the same for exceedance. We will show that appropriately normalised exceedances converge to the Generalised Pareto distribution. This is the POT model. The work on exceedances was independently initiated by [15,22]. As before, we will start with definitions and then proceed to establishing the Generalised Pareto as the limiting distribution.

Definition 3.2.
Let X be a random variable with d.f. F and right endpoint x F . Suppose we have the threshold u < x F . Then the d.f., F u , of the random variable X over the threshold u is defined to be

Definition 3.3. The Generalised Pareto distribution is defined as
Note that, as for G γ , the Generalised Pareto distribution also has scale and location parameters: Now that we have looked at the d.f. of exceedance and defined the Generalised Pareto distribution, the latter can be established as the limiting distribution of exceedances.

Theorem 3.7 (Balkema-de Haan-Pickands Theorem). One can find a positive, measurable function β such that
if and only if F ∈ D(G γ ).
Not only does the Balkema-de Haan-Pickands theorem allow us to use more than the maximum, it also connects the d.f. of the random variables to that of exceedances over a threshold; from knowing the limiting distribution of F u , we also know about the domain of attraction of F and vice versa. The shape parameter in both cases is the same and thus their interpretation is the same as before, i.e., γ describes the tail-heaviness of F if Eq. (3.17) is satisfied. Holmes and Moriarty [23] used the Generalised Pareto distribution to model particular storms of interest for applications in wind engineering and [24] used the POT method to analyse financial risk.

Asymptotic Distribution of Certain Order Statistics
In the previous section, we talked about how the POT approach can use data more efficiently. The efficiency relies on choosing the threshold appropriately. If the threshold is too low, then the exceedances are no longer from the tail and the bias is dominant. On the other hand, if the threshold is too high, then very few data points exceed it and the variance is high and confidence in the results is low. We can consider this idea of balancing the effects of bias and variance by considering a certain kind of order statistics. This is the topic of this section.
Suppose X 1 , X 2 , . . . , X n , . . . are i.i.d. random variables with common d.f. F. If we take a finite sample X 1 , . . . , X n and order it from minimum to maximum, then we get the nth order statistics: X 1,n ≤ X 2,n ≤ · · · ≤ X n,n .
(3.18) Furthermore, we can define the kth upper order statistic, X n−k,n , to be the kth largest value in the finite sample; the nth upper order statistic, i.e. k = n is the maximum and the first upper order statistic, i.e. k = 1, is the minimum. Depending on k and its relation to n, the kth upper order statistic can be classified in at least three different ways which leads to different asymptotic distributions. Arnold et al. [13] classified X n−k,n to be one the following three order statistics: 1. Central Order Statistics: X n−k,n is considered to be a central order statistic if k = [np] + 1 where 0 < p < 1 and [·] characterises the function which is the smallest integer larger than the argument. 2. Extreme Order Statistics: X n−k,n is an extreme order statistic when either k or n − k is fixed and n → ∞.

Intermediate Order Statistics: X n−k,n is an intermediate sequence if both k and
n − k approach infinity but k/n → 0 or 1. In this book we present results for k/n → 0 and we also assume that k varies with n i.e. k = k(n).
Note that the conditions which ensure that X n−k,n is an intermediate order statistic has similar notions of balancing bias and variance; insisting that k/n → 0 means that all data points larger than X n−k,n is a small part of the entire population and ensures analyses pertains to the tail of the distribution. However, for asymptotic results to hold, some of which we have seen in Sects. 3.2 and 3.3.1 and will see in this section, we require a large enough sample, i.e. k should go to infinity. As such identifying k appropriately is a crucial and a non-trivial part of extreme value analysis and also proves useful for the POT model as it allows us to chose u to be value which corresponds to the intermediate order statistics, X n−k,n .
Since we use intermediate order statistics in our case study on electricity load in Chap. 5, it is of more immediate interest to us but for the sake of completeness and intuitive understanding we discuss the asymptotic distribution of all three order statistics. First, we consider the convergence of the kth upper order statistics.

→ x(c).
Note that result 3.8 of Theorem 3.8 relates to intermediate order statistics whereas result 3.8 relates to central order statistics. The proof is simple and can be found in [12]. We now proceed to the discussion of the asymptotic distribution for each of the order statistics.  (3.19) where N (μ, σ 2 ) denotes a normal distribution with mean μ and variance σ 2 . Thus note that the central order statistics, when appropriately normalised, converges to the normal distribution. This property is known as asymptotic normality and is particu-larly desirable for estimators as it allows for the construction of confidence intervals with relative ease. The proof of Theorem 3.9 can be found in [13]. We can consider the asymptotic distribution of the extreme order statistics (also known as the upper order statistic) which no longer exhibits asymptotic normality. Instead, in this case, we recover links to the Generalized Extreme Value distribution, G. Theorem 3.10 (Asymptotic distribution of an extreme order statistic). For any real x, P(X n,n ≤ a n x + b n ) → G(x) as n → ∞ if and only for any fixed k, for all x.
The proof can be found in [13]. Note that F being in the domain of attraction of an extreme value distribution implies that Eq. (3.20) holds with the same a n and b n and thus establishes a strong link between the asymptotic behaviour of extreme order statistics and the sample maxima. However, when k is allowed to vary with n as for intermediate order statistics, we again acquire asymptotic normality. A proof for Theorem 3.11 can be found in [25]. Thus we see that although intermediate order statistics is somewhere between central order statistics and extreme order statistics and intuitively closer to the latter, its asymptotic behaviour is more akin to that of the central order statistics. Theorem 3.11 also gives us the appropriate normalisation. We now consider an example as it will demonstrate how to use Theorems 3.9 and 3.11. It is also useful for numerical simulation. Similarly, Theorems 3.9 and 3.10 can be used to choose appropriate normalisation for the relevant order statistics. Of course, in the above example, we have readily applied Theorem 3.11. In practice, we will need to check the von Mises condition or other relevant assumptions. This is taken for granted in the above example.
Order statistics are particularly useful as they are used to build various estimators for γ and x F . The commonly used Hill estimator for γ > 0, is an example as is the more general Pickands estimator.

Extended Regular Variation
We have already alluded to the topics in this section however due to the technical complexity, it is given only at the end. The theory of regular variation provides us a tool box for understanding various functions that we have come across. Moreover, to set the theory that we have discussed within a wider framework, stronger conditions are necessary. These conditions follow readily if we are familiar with the theory of regular variation. The topics in this section may seem disjointed and irrelevant but in fact, it is instrumental to making extreme value theory as rich and robust as it is. We will start with the fundamentals. Similarly, we can offer a more general version as follows.
ρ is called the index of regular variation [notation: f ∈ RV ρ ]. Note that if f satisfies Eq. (3.22) with ρ = 0, then f is slowly varying. Strictly speaking the above definitions require f : R + → R to be Lebesgue measurable. We can readily assume this as most functions in our case are continuous and thus satisfy Lebesgue measurability. Note also that all regularly varying functions f can be written in terms of the a slowly varying function , i.e., if f ∈ RV ρ , then f (x) = x ρ (x) where ∈ RV 0 . Note then that in Theorem 3.3, the tail of F was regularly varying in both the Fréchet and Weibull cases. We can make this even more general by considering functions that are of extended regular variation and/or belonging to a class of functions denoted by .

Definition 3.6.
A measurable function f : R + → R is said to be of extended regular variation if there exists a function a : R + → R + such that for some α ∈ R\{0} and for x > 0, [Notation: f ∈ ERV α ]. The function a is the auxiliary function for f . While we do not show this, a ∈ RV α . We can see now observe that F ∈ D(G γ ) =⇒ U ∈ ERV γ with auxiliary function a(t) (cf. Theorem 3.2). Not only this but we can link f to be regularly varying as follows.
The proof can be found in Appendix B of [18]. Since we now have the relation between the normalising constants and EVI with the index of regular variation, it can be used to construct estimators for the EVI. It can also be used in simulations where the true value is known or can be calculated.

Definition 3.7.
A measurable function f : R + → R is said to belong to the class if there exist a function a : R + → R + such that, for x > 0, where a is again the auxiliary function for f [Notation: f ∈ or f ∈ (a)]. In this case, a is measurable and slowly varying. Note that functions that belong to class are a special case of functions which are of extended regular variation, i.e. where the index is 0. Next we consider Karamata's theorem which gives us a way to integrate regularly varying function.
Note that in Theorem 3.13, the converse for α = −1 does not necessarily imply that f is regularly varying. It is obvious how the definitions and theorems we have looked at so far are relevant; we have provided examples of functions that were used in the report that satisfy one or more definition. Recall that in Sect. 3.3, we made mentions of second order refinements. The next part, though glance rather terse at first glance, provides a good source of valuable information to the prediction of distinctive features in extreme data. We shall look further at extended regular variation of U in Eq. (3.10) (i.e., Eq. (3.6) specialised in U ) to give thorough insight as to how the normalised spacings of quantiles attain the GPD tail quantile function in the limit. The second order refinement below seeks to address the order of convergence in Eq. (3.10). Definition 3.8. The function U is said to satisfy the second order refinement if for some positive function a and some positive or negative function A with lim t → ∞ A(t) = 0, x > 0, (3.26) where H is some function that is not a multiple of D γ := (x γ − 1)/γ.
The non-multiplicity condition is merely to avoid trivial results. The functions a and A may be called the first-order and second-order auxiliary functions, respectively. As before, the function A controls the speed of convergence in Eq. (3.10). The next theorem establishes the form of H and gives some properties of the auxiliary functions. Moreover, for x > 0, (3.28) The results are understood in continuity i.e. taking the limit as ρ and/or γ goes to zero. This gives us that (3.29) The case of γ = 0 and/or ρ = 0 is interpreted in the limiting sense as log x. Without loss of generality the constants featuring in the above can set fixed at c 1 = 1 and c 2 = 0 (cf. Corollary 2.3.4 of [18]). The parameter ρ describes the speed of convergence in Eq. (3.26): ρ close to zero implies slow convergence whereas if |ρ| large, then convergence is fast. The above theorem results from the work of [26]. Finally, we can provide the sufficient second order condition of von Mises type. These definitions and results may seem unrelated or arbitrary but in fact some of the proofs of other results borrow understanding from the theory of regular variation, and functions such as the tail quantile function U as seen as of extended regular variation. Thus, regular variation theory allows us to extend the theory of extremes much further in a very natural way, it enables a full characterisation at high levels of the process generating the data by looking at the asymptotic behaviour of the exceedances above a sufficiently high threshold. It also allows us to prove asymptotic normality for various estimators. Thus, though quite involved, it is a very useful tool in extreme value analyses and is highly recommended for the enthusiastic or mathematically motivated reader.
In conclusion, extreme value theory gives us a broad and well grounded foundation to extrapolate beyond the range of available data. Using either sample maxima or exceedances over a threshold, valuable inferences about extremes can be made. These are made rigorous by the first order and second order conditioning, which are underpinned by the broader still theory of regular variation. Moreover, we have techniques to conduct these analyses even when conditions of independence and stationarity do not hold. These results have already been adapted to fields such as finance, flood forecasting, climate change. They are accessible to yet more fields, and in this book they will be adapted for electricity demand in low-voltage networks.