A Point Process Characterisation of Extreme Temperatures: an Application to South African Data

The point process (PP) modelling approach is considered a more elegant alternative of extreme value analysis. This is because of its capability in modelling both the frequency and intensity rates of the occurrence of extremes. In this paper, we demonstrate the use of the PP modelling approach in which stationary and non-stationary models are used in modelling average maximum daily temperature (AMDT) in South Africa. The data constitutes average daily temperature observations that are collected by the South African Weather Services over the period 1 January 2000 to 30 August 2010. This study is interested on the occurrence of extreme high temperature and because of that the data for non-winter season (1 September to 30 April) of each year is used. A penalised regression cubic smoothing spline function is used for non-linear detrending of the data and determining a fixed threshold above which excesses are extracted and used. An extremal mixture model is then fitted to determine a threshold in which a boundary corrected kernel density is fitted to the bulk model and a generalised Pareto distribution (GPD) fitted to the tail of the distribution. The data exhibits properties of short-range dependence and strong seasonality, leading to declustering. An interval estimator method is used to decluster data for the purpose of fitting PP models to cluster maxima. The models that are used in this paper are nested and, as a result, likelihood ratio tests are conducted using the deviance statistic. The tests support the fit of the stationary PP model. We further fitted the stationary GPD and used the formal tests which are the Cramér-von Mises test and the Anderson-Darling test to diagnose fit. These tests and the diagnostic plots support fit of the stationary GPD to cluster maxima. Uncertainty of the estimates of GPD parameters is assessed in this paper using bootstrap re-sampling approach. The stationary PP model was used with the reparameterisation approach to determine frequency of the occurrence of extremely hot days, which are found to be 15 times per year. The modelling framework and results of this study are important to power utility companies in scheduling and dispatching electricity to customers during a hot spell.


Introduction
Planners and decision makers in power utility companies like Eskom, South Africa's power utility company, face uncertainties in the demand for electricity [1]. This is due to several factors that include weather conditions like temperature which is a global major driver of electricity demand [1,2]. Modelling of electricity demand using climate variables (AMDT) in South Africa. We discuss the details of the case study in Section 3.
The principal motivation for this paper is based on the fact that the occurrence of extremely low or extremely high temperatures influences electricity demand in both the domestic and industrial electricity users. During non-winter season, the electricity users switch on cooling systems until a point at which all the cooling systems are on, implying an extreme increase in electricity demand [1]. The purpose of this paper is to demonstrate and compare the extent to which the PP and the generalised Pareto distribution (GPD) modelling approaches can be used in modelling AMDT in South Africa. The temperatures are modelled for the purpose of studying the impact of extreme AMDT on electricity demand in South Africa.

Literature Review on Related Problems
Abatan et al. [4] analyse trends in extreme temperature over Nigeria from percentile-based threshold indices. The study focuses on spatial and temporal trends in extreme temperature for 21 stations in Nigeria over the years 1971-2012. It was concluded in Abatan et al. [4] that although a majority of the stations have significant trends on warm days and warm nights, the annual trend is greatest on warm nights. Thornton et al. [5] discuss the role of temperature in the variability and extremes of electricity and gas demand. The paper investigates the daily relationship of electricity and gas demand with temperature in Great Britain from 1975 to 2013 and 1996 to 2013, respectively. Extreme demand periods are assessed using detrended daily temperature observations from 1772. The results establish that 1 in 20 year peak day electricity and gas demand estimates are above their average winter day demand [5].
Chikobvu and Sigauke [6] discuss the modelling of the influence of temperature on average daily electricity demand in South Africa using a piecewise linear regression model and the generalised extreme value theory approach over the years 2000-2010. The empirical results show that electricity demand in South Africa is highly sensitive to cold temperatures. It was further concluded that extreme low average daily temperatures of the order of 8.2 • C are very rare in South Africa and that they only occur about 8 times in a year, resulting in huge increases in electricity demand [6].
Lebotsa et al. [7] discuss an application of partially linear additive quantile regression models for predicting shortterm electricity demand during the peak demand hours from 18:00 to 20:00 using South African data for January 2009 to June 2012. Results from the unit commitment problem show that it is very costly to use gas-fired generating units and that for any increase in demand above the median quantile forecasts, it will be economical to increase the generation of electricity from generating unit [7]. Although temperature is an important weather variable in the energy sector, it is important to study the impact of other climate variables such as water supply, wind speed, and humidity on electricity demand.
Sridharan et al. [8] develop a framework consisting of long-term models for electricity supply and water systems management to assess the vulnerability of potential expansion plans to the effects of climate change. This study demonstrates that failing to climate-proof infrastructure investments can result in significant electricity price fluctuations in selected countries such as Uganda and Tanzania, while others such as Egypt are less vulnerable [8].
Wangsa and Wee [9] propose an integrated distribution system model for electricity supply chain consisting of a power plant, a transmission substation, a distribution substation, and multi-customers. A procedure to derive the optimal solution and the decision variables is developed using differential calculus. The results of the study revealed that the cost decreases as the blackout ratio and distribution substation capacity increase and that the cost is insensitive to the changing electricity consumption time [9].
Like any other country, South Africa faces problems when it comes to the sufficient supply of electricity. Depending on the availability of necessary resources, the use of renewable energy can be more beneficial to both the energy sector and electricity consumers. Aliyu et al. [10] review the prospects of four major renewable energy sources which are hydro, solar, wind, and biomass for each of the three leading countries in Africa, namely South Africa, Egypt, and Nigeria. Of these three countries, it is emphasised that South Africa has the best model for tackling its energy challenge through the use of wind and solar energy. In the three countries and indeed across the African continent, access to renewable energy for off-grid, small-scale application is key to successful resolution to energy crisis, sustainable development, and fighting climate change [10].
Flowers et al. [11] assess climate impacts on the cost of solar energy in the United States of America. The assessment is based on the formulation of an extended levelised cost of energy calculation which considers photovoltaic module performance and lifespan as a function of local climate. This assessment is considered to be the first to quantify trade-offs between technologies, geographies, and policies in a unified manner [11].
The occurrences of extreme observations lead to the use of EVT which comprises a rich family of techniques that are suitable for modelling tail behaviour of thin or heavy-tailed distributions [12]. The use of symmetrical techniques in such cases leads to limitations such as unreliable estimates since most of the sample values fall outside the tail of the distribution, as well as the difficulty in estimating the model parameters that would lead to a good fit in the tails [3,12].
The use of EVT in modelling electricity demand is usually preferred in the energy sector because of the capability of EVT in modelling tail behaviour of thin or heavy-tailed distributions.
The use of generalised additive models in predicting medium-term hourly electricity demand using South African data for the years 2009 to 2013 is discussed in Sigauke [13]. The selection of variables is done using least absolute shrinkage and selection operator via hierarchical interactions. The study shows significantly improved accuracy of forecasts that may serve as a useful tool for decision makers and system operators in power utility companies, particularly in maintenance planning including mediumterm risk assessment [13].
One of the vital tasks in statistical modelling is the assessment of the extent to which the models fit the data. Davison and Smith [14] fit the GPD over several thresholds with applications in hydrology, followed by calculating the Anderson-Darling and the Kolmogorov-Smirnov statistics to test fit. The limitation that was encountered is the lack of the formal tests that are suitable to diagnose fit of the GPD. Because of this limitation, the authors opted for the use of tables in testing for exponential property, which was unsuccessful since there were too high test statistics [14]. Choulakian and Stephens [15] discuss the formal goodnessof-fit tests of the GPD models together with applications. These tests are based on the Cramér-von Mises and the Anderson-Darling statistics.

Modelling Non-stationary Extremes Using the Point Process Approach
The key benefit of using the PP tail model over the GPD is that the former can more easily be extended to the nonstationary case by allowing a change in the intensity or number of threshold exceedances over time or space, which has a natural physical interpretation [16]. Furthermore, since the PP model describes the threshold exceedance process over time or space, it is more straightforward to consider representation of dependence in the extremes. This includes for example, clustering of exceedances, which are frequently observed in various applications such as finance and hydrology [16].
Let Y = {Y 1 , . . . , Y n } denote a sequence of AMDT with one observed covariate x such that X = {X 1 , . . . , X n }. Of these n observations, let n τ denote the number of exceedances above a sufficiently high predetermined threshold, τ . In this paper, a time-varying covariate is used. Pickands [16] showed that the point process defined by: is well approximated in the limits as n → ∞, by an inhomogeneous Poisson process on the region A = [0, 1] × (τ, ∞), for a sufficiently high threshold τ , with the intensity function on the subregion B = (t 1 , t 2 ) × (x, ∞) given by: where x > τ and the scaling constant n τ is the number of blocks of observations [16]. In this paper, n τ is considered as the number of exceedances with probability shown in the first component of Eq. 2.
Modelling threshold exceedances using the PP framework follows a process similar to that of threshold modelling via the GPD, with the PP conditional probability model parameterised by (μ, σ, ξ) [16]. The application of the PP modelling framework relies on the choice of a suitably high threshold τ , above which the asymptotically motivated models provide a reliable approximation. The likelihood for the extreme value mixture (EVM) model is separable out into the contribution from the observations below the threshold (the kernel density component) and those above the threshold (the GPD or PP tail model). The PP likelihood used in this paper is obtained by straightforward application of the Poisson likelihood with inhomogeneous intensity function given in Eq. 1.
The non-stationary likelihood function of the PP model over the region [0, 1] × (τ, ∞) is given by Macdonald et al. [17]: for ξ(x i ) = 0, θ = (μ(x), σ (x), ξ(x)) the parameters modelled as functions of the covariate x, where B = {i : In Eq. 2, θ denote a vector of parameters, μ is the location parameter, σ is the scale parameter, and ξ is the shape parameter. If the covariate x is the time-varying effect denoted by t, the model is then written in a non-stationary context as follows: The first component of the likelihood function in Eq. 2 represents the probability of getting n τ exceedances, where n τ is Poisson distributed and discussed under Eq. 1. We now define a stationary model and various non-stationary models for modelling the variation on the location and scale parameters using linear, exponential, and quadratic transformations.
According to Coles [3], it is usually challenging to model the variation in the shape parameter. In this paper, we also limit the non-stationary modelling to the location and the scale parameters only. The models (M 0 , . . . , M 3 ) are considered in this paper: Model M 0 is stationary and the rest of the models involve linear, exponential, and quadratic transformations to cater for non-stationarity of the location and scale parameters. Model M 1 uses a linear transformation of the location parameter, whereas model M 2 uses an exponential transformation of scale parameter. A quadratic transformation of the scale parameter is used in model M 3 . The models are nested and we carry out likelihood ratio tests based on the deviance statistic. This is useful for examining the fit of the models and choosing the best out of several candidate models. The deviance statistic is given as: where M j is the model tested against model M 0 and if D > χ 2 k we reject the null hypothesis and conclude that model M j provides a significant improvement over model M 0 .

Frequency Analysis Using the Poisson-Generalised Pareto Distribution
The frequency and intensity of occurrence of extremely high temperatures constitute the core of the analyses in this paper. This forms part of the results that could be useful to the planners and decision makers in the energy sector. This section discusses the use of the orthogonal and the reparameterisation approaches for estimating the frequency and intensity of the annual occurrence of extremely high temperatures.
The orthogonal approach is described in Smith [18] as the one that separately estimates the frequency and the intensity rates of the occurrence of extremes and hence difficult to interpret under non-stationary context. Due to this limitation, this paper considers time-homogeneous point process models for calculating intensity rates and frequency using the reparameterisation approach.
The Poisson-GPD is briefly discussed in this section. We show how the Poisson-GPD can be used to estimate the frequency of occurrences of extreme AMDT per year. Let Y j , j = 1, . . . , n be AMDT. The interest is in observing Y j > τ, where τ is a sufficiently high threshold. Let X 1 , . . . , X n be independent and identically distributed (i.i.d) random variables with the observed indices i for which X i > τ. In case these indices are rescaled to points j n , they are considerable as a point process of rescaled exceedance times on , the process converges weakly to a homogeneous Poisson process on The number of exceedances, denoted by n τ in some specified unit time, say a year, follows a Poisson distribution with mean θ. The observations Y 1 , . . . , Y n τ are then assumed to follow the GPD [18]. The stationary PP model M 0 is used for calculating the frequency of occurrence of extremely hot days. This is done using the reparameterisation approach that is discussed in Smith [18] and given in Eq. 9.
We show the relationship between the Poisson-GPD process with the survival function of the generalised extreme value distribution (GEVD) for annual maxima. If we let x > τ, then the probability that the annual maximum of the Poisson-GPD process is greater than x is given as: Now if are substituted in Eq. 8, we get the survival function of the GEVD, which is: The derivation in Eq. 8 results in the reparameterisation approach that is discussed in Smith [17] and given in Eq. 9 as follows: According to Smith [18], the GEV and GPD models are entirely consistent with one another above the threshold τ , and Eq. 9 gives an explicit relationship between the two sets of parameters σ * and λ. These parameters are then estimated in this paper asσ * andλ in the process of calculating the frequency and intensity of the annual occurrence of extremely hot days in South Africa. This is carried out using the stationary PP model that is fitted in Section 4.

Description of the Case Study
The modelling framework discussed in Section 2 is then applied to a real-life data set. The data that is used in this paper is a time series of AMDT that were collected by the South African Weather Service (SAWS) from the 28 weather stations in South Africa. The data was collected over the period 1 January 2000 to 30 August 2010 and was then divided into winter and non-winter seasonal versions, where non-winter is defined as the period from 1 September to 30 April of each year and the rest of the data is for the winter season [1,2]. Only the data for non-winter season is used in this paper for the purpose of modelling the frequency of occurrence of extremely high temperatures. There are several packages that are discussed in literature as relevant for addressing issue of missing observations in the data set. We impute the missing data using the R package "Mice" [19]. The use of this package is preferred because it performs multiple imputations based on fully conditional specifications that are implemented by the Mice algorithm described in Van Buuren et al. [19].
Suppose that x 1 , . . . , x n denote a series of AMDT that are i.i.d and let τ be a sufficiently high threshold. The values x (1) ≤ · · · ≤ x (k) are k positive residuals above τ if {x i : x i > τ} and the observations y j = x (j ) − τ , for j = 1, . . . , k are threshold excesses [12,20]. When the threshold is too high, there are few positive excesses above the threshold and hence a large variance [3,12,20]. A low value of the threshold leads to the violation of the asymptotic feature of the GPD, implying bias [3,12,20]. There are various threshold selection tools that are discussed in literature, including Pareto quantile plot, mean excess plot, threshold stability plot, and the mean residual life plot [3,12].
In this paper, we prefer the use of a non-parametric EVM model for the purpose of avoiding consequences of fixed threshold approach such as ignorance of uncertainties and inability to compare GPD or PP parameters of an entire model at different thresholds as discussed in Hu and Scarrott [21]. In addition to the use of non-parametric EVM models, we use threshold stability plots. Amongst various smoothing techniques, the penalised cubic smoothing spline (PCSS) function is used for non-linear detrending the data and for determining the sufficiently high time varying threshold.
The PCSS has become a very powerful and applicable smoothing technique over the last decade as emphasised in Krivobokova [22]. This non-parametric method can be viewed as a generalisation of smoothing splines with a more flexible choice of bases and penalties. The main benefit of using PCSS as emphasised in Krivobokova [22] is that it ties with various modelling frameworks including EVT, ridge regression, mixed models, and Bayesian statistics.
Initially, we fit a PCSS as a time-varying threshold which is given as: where x t is the temperature, f (t) is a PCSS function, and λ is a smoothing parameter that is chosen based on the generalised cross-validation (GCV) approach [23]. We then extract the residuals (excesses) above this threshold and fit a fixed threshold above the positive residuals (excesses above the time varying threshold) using the non-parametric EVM model proposed in Hu and Scarrott [21] (Fig. 1). The procedure for the application of non-parametric EVM model is to fit a kernel density to a bulk model that is given in Eq. 11, followed by fitting GPD to the tail of the distribution. The sufficiently high threshold is then determined by estimating τ as shown in Fig. 2. See MacDonald et al. [17] for details. The distribution function of the boundary corrected EVM model is given in Eq. 11.
In stationary sequences, extreme values can occur in clusters, for which asymptotic characterisations and methods of inference are detailed in Ferro and Segers [24]. The first step in making inferences is to identify clusters in the data, followed by declustering. However, prior to the process of declustering, the extremal index (reciprocal of the mean cluster size) which is an important parameter (0 ≤ θ n (τ ) ≤ 1) for measuring the degree of clustering of extremes in a stationary process needs to be calculated.
Apart from the extremal index being of interest in its own right, it is a crucial parameter for determining the limiting distribution of extreme values from the process as emphasised in Smith and Weissman [25]. Considering the point process of exceedance times over a sufficiently high threshold, the extremal index can be shown to converge asymptotically to a clustered Poisson process [25].
Smith [26] argues that the seasonality component is associated with short-range dependence since a hot day is likely to be followed by hot days and that the same applies to other seasons. The dependence feature leads to a weakness whereby the observations are clustered or grouped together, violating the i.i.d condition [26]. We decluster the exceedances above the threshold to get rid of short-range dependence. See the 'texmex' R package of Southworth and Heffernan [27] for theoretical and software details of the declustering algorithm. We apply the interval estimator method of Ferro and Segers [24] to decluster positive residuals above the threshold.
Let T denote a random variable on the positive integers whose distribution is given by: where θ ∈ (0, 1] and p ∈ (0, 1) may be viewed as F (τ n ).
From the mathematical details of the declustering algorithm discussed in Ferro and Segers [24], it then follows that: The first-order bias of θ at a threshold τ is θ(2−3θ/2)F (τ ). The relationship: which motivates the interval estimator given in Eq. 14.
whose first-order bias is 0, where τ is a sufficiently high threshold and T i denotes interexceedance times [24]. The extremal index, θ n (τ ) is calculated prior to the declustering and used for measuring the amount of clustering and 0 ≤ θ n (τ ) ≤ 1, where 1 θ n (τ ) is the limiting mean cluster size [24]. After determining the sufficiently high threshold and declustering, the stationary and non-stationary point process models that are discussed in Section 2 were then fitted to cluster maxima using the MLE method. The parameters were estimated in this paper using the method of maximum likelihood estimation (MLE) with the likelihood function that is given in Eq. 2. These procedures were carried out using R, a programming language for statistical computing [28].
The R packages used in this paper include, but not limited to, 'texmex' by Southworth and Heffernan [27], 'Evmix' by Hu and Scarrott [21], and 'evd' by Stephenson [29]. The stationary (time-homogeneous) model that is fitted and used as a base model in comparison with various non-stationary models is given in Eq. 3. The non-homogenous model with linear transformation of the location parameter is M 1 in Eq. 4. In Eqs. 5 and 6 we fit the non-homogeneous point process models with exponential and quadratic transformations of the scale.
The models that are used in this paper are nested and to that effect, the likelihood ratio tests are carried out based on the deviance statistic. The tests support fit of the stationary point process model. We further fitted the stationary GPD in comparison with the stationary PP model. The formal tests which are the Anderson-Darling test and the Cramérvon Mises test are conducted to assess goodness of fit of the GPD. These tests are conducted using the 'evd' package [29]. The Cramér-von Mises statistic W 2 and the Anderson-Darling statistic A 2 are calculated using Eqs. 15 and 16 that are respectively discussed in Hosking [30]. and where F (x (i) ) is the empirical distribution function at the ith point [30]. The frequency of occurrence of extremely hot days is the main focus of analysis in this paper as this information is useful to the planners and decision makers in the energy sector. We calculate and interpret the intensity of occurrence of extremely hot days based on the stationary point process model and the reparameterisation approach [18]. Figure 1 is a time series plot of AMDT with a time-varying threshold which is a PCSS. The smoothing parameter is chosen based on the GCV criterion and found to beλ = 1.028421e −08 . The detrended version of the data (residuals above the threshold) is given in Fig. 3 at the top left panel.

Results
It is noticeable in the panels of threshold stability plots in Fig. 3 that when threshold exceedsτ = 2.9134, the parameters begin diverging from their mean. Figure 2 shows the results of fitting the non-parametric EVM model and the threshold is found to beτ = 2.9134. The extremal index is estimated prior to the declustering process and found to be θ = 0.5908, which is large enough to give a low rate of exceedance. The rate of exceedance is 1/0.5908 = 1.69 ≈ 2, meaning that the cluster maxima occur in groups of approximately 2. From these findings, we establish that temperature tends to be very high on consecutive days but very hot spells tend not to last longer than 1 or 2 days. The original series contains 1407 observations and there are 81 identified cluster maxima (Figs. 4 and 5). Excesses above the threshold are found to be 144. The GPD is fitted to cluster maxima and the exceedances above the threshold are indicated in Fig. 6. Threshold diagnostic plots are given in Fig. 3 which shows on the top right panel, the threshold stability plot for extremal index. The bottom panel shows threshold stability plot for scale parameter on the left and threshold stability plot for shape parameter on the right. Further assessment of parameter estimates at several thresholds is given in Fig. 4 which shows at the top left panel, the threshold stability plot of logarithmic scale; threshold stability plot of shape parameter at the top right panel; and mean residual life plot at the bottom panel. Figure 5 shows quantile-quantile (Q-Q) plots for normalised interexceedance times against standard exponential quantiles. The Q-Q plots are mainly useful for the purpose of assessing several thresholds for estimated value of extremal index. The chosen threshold (τ = 2.9134) is compared with various candidate thresholds (τ = 2.5,τ = 2.4, and τ = 2.35) in Fig. 5. Although the Q-Q plots forτ = 2.9134 andτ = 2.5 look quite similar, the one forτ = 2.9134 appears to be more approximately linear. The Q-Q plots shown in Fig. 5 show that a threshold ofτ = 2.91 gives a good fit of the model to the AMDT data.

Results from Fitting the GPD
The GPD is fitted to cluster maxima of AMDT using the MLE method and the results are summarised in Table 1. The parameter estimates are found to beθ = 0.9168(0.1285) andξ = −0.1855(0.0956) with standard errors in parentheses, respectively. The negative value of the shape parameter reveals evidence that W ξ belongs to the Weibull class of the GPD and hence the appropriateness of the Weibull class of distributions in modelling maximum daily temperature in South Africa. A 95% confidence interval for the shape parameter ξ is calculated to justify that the shape parameter is indeed estimated by a negative value. This is accomplished as follows: ξ ± z α/2 × (standard error) ⇒ −0.1855 ± 1.96 × 0.0956 = (−0.3729; 0.001876). The shape parameter is enclosed within the interval (−0.3729; 0.001876) at a 95% level of confidence. Since this confidence interval contains zero, we carry out a formal test of hypothesis at 5% level of significance to test whether the shape parameter is zero or not. We use the likelihood ratio (LR) test under the null hypothesis that the value of the shape parameter is 0. The null and alternate hypotheses of the test are given as follows: The deviance statistic of the LR test is given in Eq. 7. We prefer the use of the modified LR test that is suggested by Hosking [30] with the modified test statistic that is denoted by LR * . It is argued that LR * gives a more precise approximation to the asymptotic distribution of LR [30].
where LR follows χ 2 1,(α) under H 0 , implying that LR = 3.841. Since there are 2 543 observations in the data set, substituting n = 2543 in Eq. 17, it follows that LR * = 3.8368. The rule is to reject H 0 if LR * > χ 2 1,(1−α) . Since    Fig. 7 show a good fit of the GPD to cluster maxima. The shaded area on the probability and the quantile plots show the linearity of the dots to guide in suggesting good fit of the GPD to the AMDT data. The same applies to the return level plot. The estimated rate of excess is calculated as the ratio of the number of cluster maxima and the number of the observations above the time-varying threshold, which is 81 144 ≈ 0.5625. Since θ = log(σ τ ), this implies that the scale parameter is estimated asσ τ = eθ = e 0.9168 ≈ 2.5013.
The estimation of target parameters in statistical modelling is usually associated to some level of uncertainty which needs to be quantified. One of the techniques that are useful for assessing uncertainty in parametric and nonparametric estimation is the bootstrap resampling approach  [31]. We apply the bootstrap resampling approach in this paper for assessing uncertainty on the estimates of GPD parameters ( Table 2). The bootstrap statistics for GPD estimates are calculated and summarised in Table 3. The biases for the scale and shape estimates are given by 0.0320 and −0.0461, whereas the standard deviations are estimated as 0.1552 for scale estimate and 0.1124 for estimate of the shape. These statistics are small in comparison with the theoretical values, implying a less level of uncertainty. The bootstrap densities for the estimates of the scale and shape respectively are given in Fig. 8, which shows good fit of the density functions to the estimated values.
In this paper, we conduct both the Cramér-von Mises and the Anderson-Darling tests to assess fit of the GPD toward modelling AMDT in South Africa. The results of the tests are given in Table 2, where column 2 shows the test statistics and the p values of the test are given in column 3. We wish to test the hypothesis H 0 : The random sample x 1 , x 2 , . . . , x n comes from the GPD, against H 1 : H 0 is false. Looking at the Cramér-von Mises test, the p value 0.6187 is large in comparison with 5% level of significance.
We fail to reject H 0 at 5% level of significance and conclude that the random sample x 1 , x 2 , . . . , x n comes from the GPD. This decision is supported by the test statistic 0.0506 in comparison with the critical region z α/2 = 1.645. The Anderson-Darling test has the p value 0.5919 and the test statistic 0.3809. In this test, we fail to reject the null hypothesis. The conclusion based on these formal tests is that the fit of the GPD is appropriate for modelling AMDT in South Africa. Table 4    out based on the deviance statistic that is given in Eq. 7 (Fig. 9).

Results from Point Process Modelling Approach
Model M 0 is used as a base model and the rest of the models are compared to M 0 to assess significant improvements. It follows that D(1, 0) = 0.0016, D(2, 0) = 0.0004, and D(3, 0) = 0.0094. These deviance statistics are small in comparison with χ 2 1,(0.05) = 3.841. We fail to reject the null hypothesis at 5% level of significance and conclude that model M j , j = 1, 2, 3 does not provide any significant improvement. From Eq. 9, it follows thatσ * = 1.0633 and λ = 15.1471 ≈ 15, which implies that the frequency of the occurrence of extremely hot days is approximately 15 times per annum. This is the annual number of days in which Eskom experiences an extremely high demand of electricity.
The frequency analysis of the occurrence of cluster maxima above the sufficiently high thresholdτ = 2.91 is discussed. Figure 10 is a bar graph that shows monthly frequency of occurrence of cluster maxima aboveτ = 2.91 over the period 2000 to 2010. This graph reveals that the least cluster maxima is 4 and it occurred during October, followed by 6 which occurs in June. The modal occurrence of the cluster maxima is 7 which occurred in August, September, and November. January, February, and May had the occurrence of 8 cluster maxima whereas the highest count of 11 was encountered in July.

Discussion
Stationary and non-stationary point process models were fitted so as to model AMDT in South Africa. The PCSS function was used for non-linear detrending the data, after which residuals above the threshold were extracted. The boundary-corrected EVM model was fitted to determine the fixed threshold. The data had revealed short-range dependence and heavy seasonality which led to declustering of exceedances. The interval estimator method was used for declustering the data. Stationary and nonstationary candidate models were fitted using the MLE method. Model adequacy was checked using graphical diagnostic tools which showed support of the stationary model. Inclusion of covariate did not result in any improvement in the models. We further fitted the stationary GPD model to compare its fit with that of the stationary point process model. The goodness-of-fit of the GPD was checked using the formal tests which are the Cramér-von Mises test and the Anderson-Darling test. It is established based on these tests that the GPD is an appropriate model for modelling AMDT in South Africa.
The fit was also supported by the diagnostic plots. We then used the stationary point process model for calculating the frequency of the occurrence of extremely hot days based on the reparameterisation approach. It has been found in this study that the number of extremely hot days occur at the rate of 15 times per annum. These are the days in which Eskom experiences an extremely high demand for electricity. We further discussed the frequency analysis of monthly occurrence of cluster maxima above the threshold over the period of the study (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010). It is established that the least occurrence of cluster maxima was during October and that the modal occurrence of 7 was encountered during the months August, September, and November. There was the highest occurrence of cluster maxima in July.

Conclusions
This paper has discussed the modelling of temperature extremes using the point process approach. The modelling framework uses penalised cubic smoothing spline as a time-varying threshold, followed by fitting the extreme value mixture model to determine a fixed threshold. The exceedances above the fixed threshold are then declustered. The stationary and non-stationary point process models, and the stationary GPD are then fitted to cluster maxima. Inclusion of trend or covariate in the point process models that are used for modelling temperature extremes in South Africa during the non-winter season (1 September 2000 to 30 April 2010) does not result in any significant improvement of the models.
Use of the formal tests (Anderson-Darling test and Cramérvon Mises test) for testing goodness of fit for the models supports the fit of stationary GPD in modelling temperature extremes in South Africa during the non-winter season (1 September 2000 to 30 April 2010). The use of the reparameterisation approach of the Poisson-GPD has established that the frequency of the occurrence of extremely hot days over the non-winter season (1 September 2000 to 30 April 2010) is approximately 15 times per annum. The results of the study based on the data for the years 2000-2010 establish that temperature tends to be very high on consecutive days but very hot spells tend not to last longer than 1 or 2 days.
Major findings and contributions of this study are as follows: -A major contribution for this paper is on the use of the PCSS function for non-linear detrending temperature observations together with the use of the non-parametric EVM model in determining the fixed threshold. -The methodology and results may be useful to power utility companies based on the fact that the frequency analysis of the occurrence of extremely high temperatures is directly relevant to the modelling of electricity demand based on the fact that the frequency and the duration of a hot spell are the important issues from the viewpoint of both the energy statisticians and the disaster risk analysts. -The major results of this paper establish that temperature tends to be very high on consecutive days but very hot spells tend not to last longer than 1 or 2 days, a remark that may be relevant to the meteorologists and disaster risk managers. -The reparameterisation approach of the Poisson-generalised Pareto distribution (Poisson-GPD) has been used to determine the number of extremely hot days, which is 15 per annum. This remark is as well important to the meteorologist for the purpose of predicting the future occurrence of hot spells. This is the number of days on which Eskom experiences extremely high demand for electricity (Fig. 11).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.