In most statistical analysis applications, common focus is placed on the measures of central tendency of the data which may include the mean and/or median. In rare or extreme events, however, interest is in the tails of the underlying distribution of data. These rare or extreme events are usually outliers in a dataset which, in most cases, are discarded during data cleaning and analysis. Natural hazards, natural disasters, and most pandemic diseases such as the 1918 Spanish Influenza and the recent new coronavirus (Covid-19) are examples of rare events. Increase in the number, frequency and intensity of natural hazards has characterised the 21st century (Maposa et al. 2017). The increased number, frequency and intensity of natural hazards such as heat waves, cold waves, tornadoes, hurricanes, floods and droughts are generally attributed to climate change (Diriba and Debusho 2020; Diriba et al. 2015; Maposa et al. 2017). Extreme value theory (EVT) is the branch of statistics commonly used in analysing extreme events (Acero et al. 2014; Bhagwandin 2013; Coles 2001; Ferreira and de Haan 2015; Heffernan and Tawn 2004; Keef et al. 2013; Maposa et al. 2017; Nemukula et al. 2018).
There are two fundamental realisations in EVT modelling; the block maxima and the peaks-over threshold (POT) (Ferreira and de Haan 2015). The present study is based on the POT approach setting. In this study, the POT approach is applied to model extreme temperature in the Limpopo province of South Africa. The POT approach has several variations with regards to the selection or identification of the threshold. Some common approaches in threshold selection are the use of residual mean excess and stability plots in Keef et al. (2013) which usually depend on the subjective visual interpretation or assessment of the plots by the user, automated threshold selection in Thompson et al. (2009) which is based on the distribution of the difference of parameter estimates when the threshold is changed. More recent advanced approaches in threshold selection are in Nemukula et al. (2018) where a penalised cubic smoothing spline is used to perform a nonlinear detrending of the data prior to fitting bivariate threshold excess models to positive residuals above the threshold, while the other advanced approach in Sigauke and Bere (2017) involves using a time-varying threshold with generalised Pareto distribution (GPD) to capture the changing climatic effects in the data.
The present study will combine the latter two approaches to perform extreme value analysis of maximum temperatures in the Limpopo province of South Africa using a bivariate time-varying threshold approach. The literature of this nature, particularly with application to maximum temperature extremes, is scarce in the province and South Africa as a whole. In general, there is limited literature of this nature worldwide. Therefore, this approach will bring a novel EVT application approach to maximum temperature extremes in the province (Figure 1) .
Background
Many areas of society throughout the world are susceptible to the effects of extreme values of temperature (Keelings and Waylen 2015; Nemukula 2018; Raghavendra et al. 2019). Temperature extremes such as heat waves and cold waves are deadly natural hazards although they occur more slowly and are more difficult to detect than a hurricane or a cyclone (DEA 2019; Henderson and Muller 1997; Sigauke and Nemukula 2018). Heat waves are reportedly occurring more frequently across much of the globe including South Africa, and under a global warming climate they are expected to increase in frequency, intensity, and duration (Coumou and Robinson 2013; Coumou et al. 2013; Keelings and Waylen 2015; Sigauke and Nemukula 2018). Climate change is regarded as the most contributing factor to recent increases in global temperatures (Winter 2016). Worldwide, temperature extremes have a major impact on agricultural, economic, health and energy sectors (Raggad 2018; Reddy and Vincent 2017; Sigauke and Nemukula 2018). For instance, extremely high temperatures such as heat waves may result in loss of plant and animal species, losses in economic goods, high energy demand for air conditioning, death resulting from heart attacks, heat cramps, fainting, heat strokes and heat exhaustion (Makate et al. 2019; Sigauke and Nemukula 2018). Extreme low temperatures such as cold waves may result in water pipelines to freeze and burst, a rise in the demand for fuels and electricity, animals not able to graze and die of starvation, frostbites in humans and animals, and other serious medical ailments (Diriba et al. 2015; Henderson and Muller 1997).
In Africa, the impact of a changing climate varies by region (Sigauke and Nemukula 2018; Wright et al. 2014; Yamba et al. 2011). By the end of the century, Southern Africa is expected to experience an average temperature increase of about two degrees Celsius higher than the predicted average global increase (Wright et al. 2014). In the past four decades (1980–2015), Southern Africa experienced 491 climate disasters (meteorological, hydrological, and climatological) that resulted in 110,978 deaths, left 2.49 million people homeless and affected an estimated 140 million people (Reddy and Vincent 2017). Changing weather conditions increase electricity demand due to the fact that in winter heating systems are used, while in summer air conditioning appliances are used (Sigauke and Nemukula 2018). This creates a big problem, particularly in South Africa where the national electricity supplier, ESKOM, is already battling with meeting the demands of the nation in energy supply (Chikobvu and Sigauke 2013; Sigauke and Nemukula 2018). ESKOM has experienced an increased demand for electricity supply over recent decades, consequently leading to rolling blackouts (Hohne et al. 2019). According to Yamba et al. (2011), energy demand is expected to change drastically in South Africa as a result of increasing temperatures and changing weather patterns, consequently affecting heating and cooling demands.
Extreme climate and weather events such as heat waves, cold waves and drought have negative impacts on the society, environment and resources management, particularly in developing countries like South Africa (Gebrechorkos et al. 2019; Sigauke and Nemukula 2018; Wolf et al. 2010). Climate change has resulted in rising temperature trends with associated changes in temperature extremes across the globe, which has the potential to impact on human health. It is generally anticipated that as the planet heats, climate variability will increase (Krugger and Sekele 2013; Reddy and Vincent 2017). Over the last five decades, South Africa has experienced a considerable increase in mean annual temperatures with hot and cold extremes increasing and decreasing in frequency across the country (DEA 2019; Diriba and Debusho 2020; Mbokodo 2017). Temperature is one of the main climatic elements that can indicate climate change (Toros et al. 2019; Worku et al. 2019; Wright et al. 2014). Global warming and its associated increase in temperature extremes pose a substantial challenge on natural systems. It is widely believed that the changing temperature due to global warming is permanently changing the earth’s climate. That is, an increase in temperature is likely to lead to a global increase in drought condition, decrease in water supplies due to evapotranspiration and an increase in agricultural demand (Diriba et al. 2015; Nhamo et al. 2019; Ochanda 2016).
Limpopo province, where the present study is carried out, is one of the nine provinces of South Africa and is one of the hottest provinces in the country (Krugger and Shongwe 2004; Phophi et al. 2020). The province is among the lowest-ranked in terms of regional gross domestic product (GDP) per capita and it is the most vulnerable province to climate change impacts. Drought is one of the main problems in the province that affect the agricultural sector due to high temperatures and unreliable rainfall (Maponya and Mphandeli 2012b; Mpandeli and Maponya 2013; Tshiala and Olwoch 2010). The recent high temperatures in Limpopo province were experienced in the western bushveld and lowveld in October 2019 (Phophi et al. 2020). These extremely high temperatures can affect agricultural production in the province leading to scarce food and water resources, which is a big threat to a country like South Africa, where the population is rapidly growing (Makate et al. 2019; Maponya and Mphandeli 2012a; Tshiala and Olwoch 2010). South Africa is also concerned about public health around extreme hot events and how the impact of these events may change in the future (Mbokodo 2017; Wright et al. 2014). For instance, extremely high temperatures and prolonged heat waves can damage agricultural production, increase energy and water consumption and also badly affect human well-being and even cause loss of livestock, plant and human lives (Chikobvu and Sigauke 2013; Reddy and Vincent 2017). During the 21st century, the global surface temperature has increased by about \(0.85\,^\circ{\text{C}}\) and many areas have experienced significant warming (Toros et al. 2019). Krugger and Shongwe (2004) found a considerable increase in temperature between 1960 and 2003 for the three stations Bela Bela, Polokwane and Messina (locally known as Musina) situated in the Limpopo province in north-eastern South Africa. The present study is built against this background coupled with the challenges brought about by temperature extremes in the Limpopo province (Fig. 1).
Literature review on extremal dependence modelling
Several studies have modelled extremal dependence with application to various variables including temperature, wind speed, rainfall, air pollution, insurance claims, financial losses and many others. Southworth et al. (2020) give a detailed computational approach of multivariate extreme value data conditional modelling using R package called ‘texmex’. The authors cautioned that dependencies between variables in the body of the data do not necessarily imply dependencies in the extremes. Another issue that makes multivariate extreme value modelling more complicated than univariate is that for an observation to be considered multivariate extreme it has to be extreme in all components simultaneously. These authors, Southworth et al. (2020), explored and gave a detailed interpretation of pairwise extremal dependence and conditional multivariate extreme value modelling using the approach of Heffernan and Tawn (2004) which proceeds by first fitting the GPD models to the marginal variables before estimating the dependence structure. Similar to the GPD model for excesses over a given threshold, the modelling approach for Heffernan and Tawn (2004) also conditions on a variable exceeding a predetermined threshold. In the present study, we also follow closely the approach by Heffernan and Tawn (2004) and the R computational approach by Southworth et al. (2020). More details on the modelling framework of these approaches are given in the next section of models.
Another issue of importance in EVT methodology is the choice of a threshold when using the POT approach. Apart from the threshold selection approaches in Sigauke and Bere (2017) and Thompson et al. (2009) mentioned earlier in the introductory section, more recently Verster and Raubenheimer (2020) proposed a generalised model in the Bayesian approach that uses the properties of the posterior distribution to select an optimal threshold without a visual inspection. The Bayesian threshold approach by Verster and Raubenheimer (2020) is based on the Topp-Leone Pareto (TLPa) distribution and was shown to perform well. In another study on threshold choice, Minkah and de Wet (2014) investigated constant versus covariate dependent threshold in the POT approach. These authors, Minkah and de Wet (2014), proposed a covariate dependent threshold which is based on expectiles. They argued that although no threshold choice method is universally the best, strong arguments against the use of constant threshold is that an observation that may be considered extreme at some covariate level may not necessarily qualify as an extreme observation when considered at another covariate level. The newly proposed approach was compared with the constant and quantile regression thresholds in a simulation study based on exponential growth data for the estimation of the GPD tail index. The findings by Minkah and de Wet (2014) revealed that the covariate dependent threshold approach outperformed the other methods for smaller to medium values in the data, while for larger values of the response variable the constant threshold outperformed the other methods. Another threshold selection method slightly different from that of Minkah and de Wet (2014) was proposed by Thompson et al. (2009). The threshold selection approach by Thompson et al. (2009) is a pragmatic automated threshold selection method which is based on the distribution of the difference of parameter estimates when the threshold is changed. The similarity on methods by Thompson et al. (2009) and Minkah and de Wet (2014) is that the automated threshold selection can also be extended to depend on a covariate value such as the wave direction cosine. In a separate study, Sigauke and Bere (2017) used a GPD with time-varying covariates and thresholds to model daily peak electricity demand for South Africa. The threshold selection approach by Sigauke and Bere (2017) makes use of a penalised cubic smoothing spline with a constant shift factor as a time-varying threshold. They used an intervals estimator method in declustering observations that exceed the threshold. They further included temperature as a covariate in the GPD parameters in order to explore its influence on electricity demand. The findings by Sigauke and Bere (2017) showed a better fit for the GPD model to the data when compared to the generalised extreme value (GEV) distribution. The present study will adopt the GPD time-varying threshold selection approach by Sigauke and Bere (2017) to cater for climate change effects in the maximum temperature data. Unlike the use of the method by Sigauke and Bere (2017) in univariate modelling, the present study extends its use to conditional multivariate extremal dependence modelling.
A study closely related to the present study in multivariate extreme value theory (MEVT) is that of Nemukula et al. (2018) who used bivariate threshold excess in modelling temperature extremes in the Limpopo province for three meteorological stations Lephalale, Polokwane and Thohoyandou. Similar to the present study, the approach by Nemukula et al. (2018) also used a penalised cubic smoothing spline to perform a nonlinear detrending of the temperature data before fitting bivariate threshold excess models to positive residuals above the threshold. The present study, however, extends the approach of Nemukula et al. (2018) by using a time-varying threshold instead of a constant threshold to capture the climate change effects in the monthly maximum temperature data series. Additionally, except for Polokwane meteorological station, three new stations Mara, Messina (also known as Musina in the local language) and Thabazimbi are used in the present study (Fig. 1).
Another literature of importance in this present study concerning MEVT is that of Tilloy et al. (2020) who evaluated the efficacy of bivariate extreme value modelling approaches in their estimation of risks generated by multi-hazard scenarios. These authors, Tilloy et al. (2020), fitted six distinct stochastic copula models to the synthetic datasets and concluded that there is no one shoe size fits all in bivariate extreme value modelling. They found that no one model was able to fit their synthetic data for all the parameters, instead, several models were appropriate to fit the data. Tilloy et al. (2020) limited their study to stochastic copulas and the bivariate case of multivariate models based on Heffernan and Tawn (2004). In their evaluation of bivariate extreme dependence, they discussed in detail issues on asymptotic dependence and asymptotic independence, as well as tail dependence measures. They argued that extremal dependence in practice tend to weaken at higher levels which may lead to dependence between variables being observed in the body of the joint distribution, while the multivariate distribution is in the maximum domain of attraction of independence. In their discussion of the bivariate models which include copulas and conditional extremes, Tilloy et al. (2020) advocated for the conditional extremes model by Heffernan and Tawn (2004) and Keef et al. (2013) which uses the Laplace margins. The conditional extremes model can accommodate both asymptotic dependence and asymptotic independence (see Heffernan and Tawn 2004; Keef et al. 2013, for more details). The present study also adopts the Heffernan and Tawn (2004) conditional extremes model with the addition of a time-varying threshold.
Research highlights
This paper addresses issues related to climate change, global warming and in particular maximum temperature extremes in the Limpopo province of South Africa. The study is based on combining two main approaches; bivariate conditional extremes model (Heffernan and Tawn 2004; Keef et al. 2013; Southworth et al. 2020) and time-varying threshold (Sigauke and Bere 2017). Conditional extremes modelling is crucial in the study of the dependence structure among several variables. Conditioning on one variable helps to understand the significant positive (or negative) extremal dependence of the remaining variables on the large values of the conditioning variable. This paper presents an application of bivariate threshold excess modelling approach with a positive shift factor as a time-varying threshold to the monthly maximum temperature extremes in four meteorological stations of Limpopo province in South Africa namely Mara, Messina, Polokwane and Thabazimbi. Among the major findings were the significant strong positive extremal dependence of Thabazimbi on large temperature values of Mara and the strong negative extremal dependence of Polokwane on large temperature values of Messina.
The main contribution of this paper is in using a penalised cubic smoothing spline to perform a nonlinear detrending of the temperature data prior to fitting bivariate threshold excess models based on Laplace margins to positive residuals above the threshold and a positive shift factor as a time-varying threshold to capture the climate change or seasonality and/or cyclic effects in the data. The existing gap in the literature was in combining these two approaches using conditional extremes dependence modelling. The overall significance of this bivariate conditional extremes dependence modelling study lies on quantifying the dependence effects of maximum temperature extremes amongst the various meteorological stations in order to reveal some useful information needed for planning by climatologists, meteorologists, agriculturalists, decision-makers and planners in the energy sector.
The rest of the paper is organised as follows: Sect. 2 gives the theoretical framework of the statistical models considered in this study which include conditional multivariate extreme value modelling, threshold selection, bivariate threshold excess model, Laplace marginal transformation, as well as the data and variables. Sect. 3 presents empirical results and a comprehensive discussion of the results. The concluding remarks and areas for future research are presented in Sect. 4.