1 Introduction

The analysis of extreme values in climatological time series is an area of intense scientific activity (Trenberth et al. 2015; Easterling et al. 2016). Extreme events related to the weather and climate, and maximum temperatures or maximum rainfall series are examples of this type of data (Mueller and Seneviratne 2012; Alexander 2016). High temperatures are among the most frequently investigated extreme events (heat waves, thermal stress, atmospheric, hydrological, soil and agricultural drought), which affect human society, agriculture, water resources, energy demand and human mortality (Allen et al. 2010; Christidis et al. 2011). This phenomenon has also an impact on the environment, for example, some animal species lose their natural habitats and the diversity of ecosystems is reduced, especially that of tropical biomes (Bailey and Van de Pol 2016).

The Extreme Value Theory (EVT) provides a firm theoretical foundation for statistical models describing extreme events. The traditional approach consists in using probability distributions of variables over the entire range of their values. In the process of estimating the distribution parameters, a better fit is obtained in the data range in which there are most of such observations (Guedes-Soares and Scotto 2004). Therefore, this approach is not very appropriate in the extreme observations analyzed. In the case of methods of the Generalized Extreme Values (GEV), the observations close to the central value are omitted and only extreme values are used to estimate the parameters of theoretical distributions (Gençay and Selçuk 2004). The GEV distribution is widely used to model extreme data in environmental sciences and in many other fields (Reiss and Thomas 2007).

Modeling of extreme air temperatures using the EVT and GEV distributionss was successfully performed for different world regions. The GEV distribution was applied to develop models of extreme air temperatures for Penang (Hasan et al. 2012), Cameroon (Ayuketang and Joseph 2014), Ghana (Sampson and Kwadwo 2019) and Kenya (Wambua et al. 2020), among others. For example, Meehl and Tebaldi (2004) presented heat wave modeling results for Chicago and Paris, which confirmed that there is a distinct geographic pattern of future changes in heat waves. Lyon (2009) applied the EVT methods and GEV distribution to assess the Southern Africa summer drought and heat waves, Nemukula and Sigauke (2018) modeled the average maximum daily temperature using “r” largest order statistics for South African data, and Wang et al. (2013) investigated historical changes in Australian temperature extremes by analyzing extreme value distribution.

Climate has a strong influence on triggering extreme events such as high air temperature and severe droughts, which aggravates biomes degradation (Hatfield and Prueger 2015; Panisset et al. 2018). It is emphasized that, the evolution of biomes goes through the relationship between phyto-physiognomy and climate variables (Coutinho 2006; Smith 2011). In many parts of the world the boundaries of tropical biomes—forests and savannas—are changing as a result of climate change and degradation caused by man (Woodward et al. 2004).

Climate and land-use changes (mainly deforestation), which synergistically increase the frequency and intensity of drought-related fires in tropical regions, lead to the dominance of grass at forest edges, which is savannization. An additional factor is ozone (O3), produced by the activity of the sun in atmospheric pollution, which, being phytotoxic, is well-known for its damaging effect on vegetation (Cirino et al., 2013; Souza et al. 2020a,b). Bioaerosols (enzymes, viruses or debris) also play a significant role in air pollution because they can be pathogenic. The toxicity of bioaerosols has a negative impact on human life, causing acute adverse reactions and various types of diseases, hence a challenge to health and the geo-environment (Gollakota et al, 2021). Although such aspects of bioaerosols as identification and quantification have been studied, research is still at an incipient stage, mainly in terms of understanding their behavior under the conditions of global warming and anthropogenic activity. Ambade et al. (2021a, b), on the basis of studies of air pollutants (i.e. PM2.5, Black Carbon- BC) and Polycyclic Aromatic Hydrocarbons (PAH) in India as well as Souza et al. (2021) by analyzing the tropospheric concentration of NO2 in Brazil, observed a significant reduction in the concentration levels of pollutants and their source distribution. They pointed out that the lower concentrations of BC, PAHs, PM2.5 and NO2 recently result from a series of blockades implemented by national governments to contain COVID-19 (Chelani and Gautam 2021). However, during normal days, the source profile of PAHs and NO2 was dedicated to biomass, coal burning and vehicle emission as primary sources, with very strong correlations between the variables and impacts on global warming (Ambade et al. 2021c; Maharjan et al. 2021). Ambade et al. (2021d), using PAH diagnostic rates and principal component analysis (PCA), showed that its main sources were attributed to coal and wood combustion, as well as vehicular emission of diesel and gasoline at all sampling sites. What can be observed for the composition of PAHs is significant seasonal variability, which is mainly attributed to the change in emission sources.

The degradation or reduction of natural vegetation in biomes due to climate change is likely to have serious consequences for the natural environment and inhabitants of the region (Silva Dias et al. 2002; Lyra et al. 2017), which include loss of biodiversity, an impact on the rainfall regulation and water balance, carbon balance, and it will limit all the ecosystem services that vegetation offers potentially (Agostinho et al. 2005; Salazar et al. 2007). A major danger to the condition of the natural environment, water quality and human health may be PAHs, which were also detected in the surface water and groundwater, as well as estuary sediments (Ambade et al. 2021e).

Despite the importance of this topic in the subtropical region, not many studies can be found in the literature, the majority being on the Amazon rainforest and temperate regions, where forest-monitoring studies are already consolidated. In the Midwest and North regions of Brazil, an increase in the intensity of forest fires is largely attributed to an increase in the air temperature and a decline in rainfall as well as an increase in the intensity of land use (Teodoro et al. 2016). Although there are studies in the literature on modeling extreme air temperature values for Brazil, it has been observed that there is little or no published research related to the modeling of maximum temperature series using the GEV distribution in the Midwest region of Brazil, especially in the state of Mato Grosso do Sul. When it comes to the Atlantic Forest, the Pantanal and the Cerrado biomes (Mato Grosso do Sul) the information is restricted to simplified studies (such as the use of independent variables) and a relatively short data series. Owing to the lack of spatial information compatible with the scales of the biomes, the performed analyzes are concentrated in the regions where information on each biome is available. While this type of limitation prevents us from making a generalization for a particular biome, it also serves as a warning about the lack of this information at scales compatible with the large areas of our biomes. There is a noticeable lack of information for the biome in the Pantanal, contrasting with the greater body of information observed for the Amazon and, secondarily, the Cerrado. Studies on the Atlantic Forest developed only recently, but they still tend to focus on a few areas.

In order to contribute to the understanding of the microclimate behavior of the biomes, our study is modeling the historical air temperature series through probability distribution functions and comparing their patterns among biomes. The general purpose of this article is to identify maximum extreme air temperatures in the state of Mato Grosso do Sul in Brazil based on the assumptions of the Extreme Value Theory. To the best of the authors’ knowledge, there is a lack of research related to this issue.

The specific purposes include: 1) goodness-of-fit of the estimated Generalized Distribution of Extreme Values (GEV), Gumbel (GUM) and Log-Normal (LN) distributions to monthly historical series of maximum temperatures of the Cerrado, Pantanal and Atlantic Forest biomes, 2) identification of the distribution that provides the best results based on different criteria, such as the corrected Akaike information criterion (AICc), the Bayesian information criterion (BIC), the root of the mean square error (RMSE) and the coefficient of determination (R2) for each month and each biome, 3) calculation of maximum temperatures expected in biomes in the return time of 10, 20, 30, 40, 50 and 100 years.

Integrated studies that make it possible to understand the connections between the biological functioning of vegetation and the climate are essential in a scenario where climate change is already altering the basic processes of functioning of the ecosystem of biomes.

2 Materials and Methods

2.1 Area of study

The state of Mato Grosso do Sul is located in the Midwest region of Brazil and covers approximately 358.16 km2 (Fig. 1). Agriculture, specifically the production of soy and livestock, is the main economic activity in the state. Its topography has elevations that vary from 24 to 1.100 m (Teodoro et al. 2016). The average annual temperatures range from 20 to 26 °C and the average annual precipitation fluctuates between 1.000 and 1.900 mm.

Fig. 1
figure 1

Left-upper: The location of the state of Mato Grosso do Sul in Brazil; right-upper: separation between biomes (Cerrado, Atlantic Forest and Pantanal), the map of altitude (left-bottom) and the map of climatic classification (right-bottom) with the location of meteorological stations (both bottom maps)

The Köppen climate classification shows a diversity of climate types: “Aw” (in the southeast and north of the state), “Am” (central region), “Af” (southwest) and “Cfa” (south of the state). The climate in the southwest of Mato Grosso do Sul, the south of the Pantanal (between latitudes of 21 and 22ºS), is tropical forest (“Af”), with rains distributed evenly throughout the year. The central part of the state has a predominantly monsoon climate (“Am”), with a small dry season in winter. In the North, in a small part of the central region and in the southeast of the state, the climate is savanna (“Aw”), being predominantly dry in the winter and rainy in the summer. Only in the south of the state is the climate humid in all seasons, with a hot summer (“Cfa”) and temperatures above 22ºC.

The diversity of the biome of the State of Mato Grosso do Sul includes areas of the Atlantic Forest (14% of the state area), the Cerrado (61% of the state area) and the Pantanal (25% of the state area)—(Fig. 1). Located in humid tropical areas and in an immense network of rivers and streams, they are closely linked to atmospheric conditions and poor soils. The vegetation in the biomes occurs in areas permanently affected by water (humid areas – groundwater on the surface or very close), seasonally flooded areas (lowland and igapó, riparian vegetation), or areas not affected by flooding (upland). The Atlantic Forest is an important biome due to its abundant biological diversity and is of great importance for the conservation area, since the area of this biome has been considerably reduced. The Cerrado of Mato Grosso do Sul is located in two hydrographic regions of Brazil, Paraná and Paraguay, and is characterized by a savanna biome, but also by seasonal forest and countryside. The Pantanal is the largest humid area in the world and needs to be preserved because of environmental degradation (Teodoro et al. 2016).

2.2 Data

The historical time series used in this work refers to the average maximum monthly air temperature of the meteorological stations in the Cerrado, Pantanal and Atlantic Forest biomes: (Fig. 1). All of them are located in the hydrographic basin of the Paraguay and Paraná rivers, on the territory of the state of Mato Grosso do Sul in Brazil. The data on the maximum temperatures of the biomes were obtained from historical records of the meteorological database of the National Institute of Meteorology (INMET 2020). The historical records cover the period from 2007 to 2018, i.e. 12 years of observations. Only consistent data, covering at least a period of 10 years, were adopted for this research. Observations with a percentage of annual failure higher than 10% were not admitted.

2.3 Methods

In this study the LN, GUM and GEV probability distributions were applied to model the maximum monthly temperature in the Cerrado, Pantanal and Atlantic Forest biomes. The probability density functions (pdfs) and their corresponding cumulative distribution functions (cdfs) are presented in Table 1.

Table 1 List of the probability density function (pdfs), cumulative distribution function (cdfs) and supports of the LN, GUM and GEV distributions

The parameter \(\mu \in {\mathbb{R}}\) is a position parameter, \(\sigma >0\) is a scale and \(\xi >0\) is a shape parameter. The parameter \(\xi\) is related to the tail weight of the GEV distribution, and for this reason, it is also called the tail index. The GUM distribution appears as a particular case of the GEV distribution, when the shape parameter tends to zero \(\left(\xi \to 0\right)\).

The estimates of the parameters for each distribution were obtained using the maximum likelihood method (ML). The log-likelihood functions of the LN, GUM and GEV distributions are given, respectively, by formulas:

$$\mathrm{ln}\,L\left(\mu ,\sigma \right)=-\sum_{i=1}^{n}\mathrm{ln}{x}_{i}-\frac{n}{2}\mathrm{ln}{\sigma }^{2}-\frac{n}{2}\mathrm{ln}2\pi -\sum_{i=1}^{n}\frac{{\left(\mathrm{ln}{x}_{i}-\mu \right)}^{2}}{{2\sigma }^{2}},$$
(1)
$$\mathrm{ln}\,L\left(\mu ,\sigma \right)=-n\mathrm{ln}\sigma -\sum_{i=1}^{n}\frac{{x}_{i}-\mu }{\sigma }-\sum_{i=1}^{n}{e}^{-\frac{{x}_{i}-\mu }{\sigma }},$$
(2)
$$\mathrm{ln}L\left(\mu ,\sigma ,\xi \right)=-n\mathrm{ln}\sigma -\sum_{i=1}^{n}\left\{\left(\frac{1+\xi }{\xi }\right)\mathrm{ln}\left[1+\xi \left(\frac{{x}_{i}-\mu }{\sigma }\right)\right]+{\left[1+\xi \left(\frac{{x}_{i}-\mu }{\sigma }\right)\right]}^{-\frac{1}{\xi }}\right\}.$$
(3)

Estimates of the distribution parameters are calculated by maximizing the log-likelihood function in relation to the parameters. Taking the partial derivatives of the \(\mathrm{ln}L\) function with respect to each of the parameters and making these derivatives equal to zero, the likelihood equations are obtained. The solutions to these equations are called maximum likelihood estimates of the parameters.

For further consideration let us assume that F(x) is an estimated distribution in the procedure described above.

In this study, we used the Kolmogorov–Smirnov (KS) test to assess whether the maximum extreme temperature comes from a hypothetical continuous distribution. Let us assume that we have a random sample of \({x}_{1},{x}_{2},\dots ,{x}_{k}\) from a theoretical distribution of cdf \(F\left(x\right)\). The empirical cdf is given by:

$${F}_{n}\left(x\right)=\frac{1}{n} \sum_{i=1}^{n}{I}_{\left\{{x}_{i}\le x\right\}}$$
(4)

where \({I}_{\left\{{x}_{i}\le x\right\}}\) is the number of observations smaller or equal to \(x\).

The Kolmogorov–Smirnov (D) statistic is based on the largest difference between the theoretical and empirical cdf:

$$D=\underset{1\le i\le n}{\mathrm{max}}\left[\left|\widehat{F}\left({x}_{(i)}\right)-\frac{i-1}{n}\right|,\left|\frac{i}{n}-\widehat{F}\left({x}_{(i)}\right)\right|\right]$$
(5)

where \(\widehat{F}\left(x\right)\) is an estimate of the cdf and \({x}_{(1)},{x}_{(2)},\dots ,{x}_{(k)}\) are the observations in an ascending order.

The null hypothesis that the empirical distribution is equal to one of the estimated distributions is rejected (i.e. data does not follow the specified distribution), at the chosen level of significance \(\alpha\), if the test statistic \(D>D\left(\alpha \right)\), where \(D\left(\alpha \right)\) is a critical value of the KS test. The significance level for this study is generally set at \(\alpha\) = 0.05.

Then, the corrected AICc and BIC criteria were calculated for all models being under the estimation procedures. The model that has the lowest value of these two criteria was selected (Burnham and Anderson 2004). The AIC and BIC criteria were obtained using the following equations, respectively,

$$AIC = - 2\ln L + 2k,$$
(6)
$$BIC = - 2\ln L + k\ln n.$$
(7)

The lnL is the natural logarithm of the likelihood function and \(k\) is the number of parameters in the model.

When the ratio between the sample size (n) and the number of model parameters (k) is less than 40, the use of the corrected AICc is recommended, as was suggested by Burnham and Anderson (2004), and Fabozzi et al. (2014). As the number of observations in the present work is \(n=12\) the corrected AIC was adopted. The corrected AIC is given by:

$$AI{C}_{c}=-2\mathit{ln}L+2k+\frac{2k\left(k+1\right)}{n-k-1}.$$
(8)

The coefficient of determination R2 and the root of the mean square error (RMSE) were also used to measure a goodness-of-fit of the examined pdfs to model the temperature data. The R2 and RMSE statistics are provided, respectively, by:

$$R^{2} = \frac{{\mathop \sum \nolimits_{{i = 1}}^{n} \left( {\hat{F}\left( {x_{i} } \right) - \overline{F} } \right)^{2} }}{{\mathop \sum \nolimits_{{i = 1}}^{n} \left( {\hat{F}\left( {x_{i} } \right) - \overline{F} } \right)^{2} + \mathop \sum \nolimits_{{i = 1}}^{n} \left( {F_{n} \left( {x_{i} } \right) - \hat{F}\left( {x_{i} } \right)} \right)^{2} }},$$
(9)

where \(\widehat{F}\left(x\right)\) is the estimated cdf and \(\overline{F} = \frac{1}{n}\mathop \sum \limits_{{i = 1}}^{n} \hat{F}\left( {x_{i} } \right)\).

RMSE statistic is given by the formula:

$$RMSE={\left[\frac{1}{n}{\sum }_{i=1}^{n}{\left({F}_{n}\left({x}_{i}\right)-\widehat{F}\left({x}_{i}\right)\right)}^{2}\right]}^\frac{1}{2}.$$
(10)

The distribution with the lowest AICc, BIC and RMSE and the largest R2 has the best fit to the original data.

The return time (return levels) represents the inverse of the probability that a given event has occurred. Given the occurrence of an event, the turnaround time is the average time required (in years) for that event to recur in any given year. In practical terms, its meaning is: if an intensity event occurs, what is the average time (T) expected for the intensity event to occur again? By definition, it follows that the turnaround time associated with the event is expressed by:

$$T=\frac{1}{P\left(E\right)}=\frac{1}{p}.$$
(11)

In this paper, the event E is the maximum temperature that exceeds a certain temperature value \({x}_{p}\) and the probability p exceeding E is obtained by \(1-F\left({x}_{p}\right)\). Therefore:

$$T=\frac{1}{p}=\frac{1}{1-F\left({x}_{p}\right)}.$$
(12)

As \(F\left(x\right)=1-p\), the level of temperature return xp, which is expected to be exceeded by the maximum monthly temperature in an average time every year T, is obtained as the solution of the equation:

$$F\left({x}_{p}\right)=1-p\Rightarrow {x}_{p}={F}^{-1}\left(1-p\right).$$
(13)

From the relation \(p = \frac{1}{T}\) and using (13) with the cdfs of the LN, GUM and GEV distributions, the quantile functions of these distributions are provided, respectively, by:

$${x}_{p}\left(T\right)={e}^{\mu +\sigma {\Phi }^{-1}\left(1-\frac{1}{T}\right)},$$
(14)
$${x}_{p}\left(T\right)=\mu -\sigma \mathit{ln}\left[-\mathit{ln}\left(1-\frac{1}{T}\right)\right],$$
(15)
$${x}_{p}\left(T\right)=\mu -\frac{\sigma }{\xi }\left[1-{\mathit{ln}\left(1-\frac{1}{T}\right)}^{\xi }\right].$$
(16)

The estimated return levels \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{x}_{p}\), which are the maximum temperature expected for return times T, are obtained by replacing the maximum likelihood estimates of the parameters in (14), (15) and (16).

All statistical analysis were performed using the R (R Core Team 2020) software. The evd (Stephenson 2002), EnvStats (Millard 2013) and fBasics (Wuertz et al. 2014) packages from the R library were used to study the data. In particular, the evd package was used for data analysis, as it has specific functions in the analysis of extreme values.

3 Results

In this section, we present only general results and the results for the Cerrado, while for the remaining biomes detailed results are shown in the tables and figures delivered in the Supplementary Material (SM).

The spatial distribution of the maximum values of the average annual air temperature in the areas of the Cerrado, Pantanal and Atlantic Forest is represented in Fig. 2. When analyzing this distribution, a relative increase in temperature is observed in the northeast-west direction of the biomes (Cerrado and Pantanal), with the highest annual averages in the western parts of the state in the city of Corumbá (Pantanal) with 35 °C. The lowest temperatures can be found in the northeast region in the city of Costa Rica and Figueirão with the average maximum temperature of 28 °C.

Fig. 2
figure 2

Spatial distribution of the maximum values of the average annual air temperature for the biomes of Mato Grosso do Sul (2007–2018)

The variation of the monthly averages of the maximum air temperature for biomes in Mato Grosso do Sul can be seen in Fig. 3. In general, the same trend of the relative increase already highlighted in the analysis of the annual average temperature has been observed. The average maximum temperature in the Cerrado is 28 °C with the range between a minimum of 24 °C and a maximum of 33 °C. The highest temperature occurs in the spring season with 30 °C, and the lowest in autumn–winter (24 °C), while in the Pantanal biome the maximum occurs in the spring with 32 °C and the minimum in the autumn with 27 °C.

Fig. 3
figure 3

Monthly averages of maximum air temperatures for the biomes of Mato Grosso do Sul (2007–2018)

The month with the highest temperature is October, when 75% of the region has an average maximum temperature of above 30.5 °C. This month presented higher temperature values with an average maximum temperature of 31 °C varying between 27 °C in the region and 33 °C in the Pantanal biome. The months of May, June and July had the lowest temperatures, when their average maximum reached values below 28 °C.

Tables 2, A.1, A.6 (SM) show the descriptive statistics of the monthly maximum temperature data for the Cerrado, Pantanal and Atlantic Forest biomes. On average, the months from August to December had the highest maximum temperature in the biomes in the period from 2007 to 2018, with maximums that varied from 28.7ºC (August) to 30.3ºC (September) in the Cerrado, from 30.9ºC to 32.4ºC (in the same months) in the Pantanal, and from 26.9ºC to 28.6ºC in the Atlantic Forest. In the Pantanal, the relatively high monthly maximum temperature also occurred in the months of January–March (between 31.0–31.6 ºC).

Table 2 Descriptive statistics for the monthly maximum air temperature data in the Cerrado (2007–2018)

The coefficient of variation (CV%) shows that the dispersion of the maximum average temperature data is different between stations. The greatest variability in relation to the average values of the maximum monthly temperature and the highest values of the coefficient variation (4.3 to 6.5%) occurred in the Cerrado in the months of May and October–December (Table 2), in the Pantanal (Table A.1, SM) in January, September and November (CV 5,3 to 6,1%) and in the Atlantic Forest (Table A.6, SM) in April, July and October (CV 5.4 to 6.1%).

The results show that all the stations have different values of asymmetry. The negative asymmetry coefficients (CS) were in the Cerrado in February, March, May and June, and in the Pantanal in February, June, August and October. In the Atlantic Forest, a biome CS indicator is predominantly negative (for eight months). The analysis of the asymmetry of the empirical distributions in each of the evaluated months can also be done based on Figs. 4, A.1, A.3 (SM). Additionally, the figures show the distribution of the maximum temperature data (black line) of the biomes during the study period, with the estimated densities of the GEV, GUM and LN distributions (SM). The analysis allows a preliminary verification of whether the estimated densities of the GEV, GUM and LN distributions are close to the empirical distribution.

Fig. 4
figure 4

The empirical distribution of maximum air temperature data (°C) and the density estimate of the GEV, GUM and LN distributions (Cerrado)

In all months, the kurtosis coefficients (CK) were less than 3, showing that the empirical distributions are platykurtic. Variables with a more flattened distribution (lower concentration) compared to the normal distribution, have the negative kurtosis value (Tables 2, A.1 and A.6, SM). Greater differences in the CK values are visible for the Cerrado and Pantanal biomes. Positive values of the indicator corresponding to the biomes were obtained in August and November–December and in January, September and November–December (Tables 2 and A.1 in SM). In May, for example (Table 2), it can be seen that the distribution of the maximum temperature data for the Cerrado in this month has a slight asymmetry to the left (CS = − 0.72) and that the empirical distribution is flat (CK = 0.15). For the Atlantic Forest only negative kurtosis was obtained (Table A.6, SM).

Tables 3, A.2 and A.7 (SM) show the maximum likelihood estimates for the parameters of the GEV, GUM and LN distributions. The maximum likelihood is a conditional density function that expresses the relationship of the value of a random variable X with the obtained information on the distribution. In the GEV distribution for the Cerrado (Table 3), the case of ξ < 0 (ξ shape parameter) was obtained for most months, which basically means a distribution with the vanishing right tail. Positive ξ was obtained only for April and November–December, which means an increased probability of the occurrence of extreme air temperature values. For the Pantanal, ξ < 0 was obtained for 6 months and ξ > 0 for the remaining period, with the highest positive value occurred in May and the lowest in June (Table A.2, SM). In the Atlantic Forest, negative values of ξ were found in all months (Table A.7, SM).

Table 3 Estimates of the parameters of pdfs for monthly data (Cerrado)
Table 4 Results of the goodness-of-fit tests and information criteria for the estimated distributions (Cerrado)
Table 5 The selection of probability distributions according to goodness-of-fit tests and information criteria (Cerrado)

Tables 4, A.3 and A.8 (SM) show the results of the KS test and the model selection criteria for each month. According to the results of the KS test, one can observe that there are three distributions close to the maximum temperature data for DM biomes in the evaluated period (p-value > 0.05).

Table 6 Probabilities of occurrence of maximum monthly air temperature of over 28, 29, 30, 31 and 32 °C in the Cerrado

Tables 5, A.4 and A.9 (SM) present a summary of which model provided the best fit in each month, according to the results of the model selection criteria shown in Tables 4, A.3 and A.8 (SM). In the months that presented, on average, the highest maximum temperature from 2007 to 2018 (August to December), the GUM and GEV extreme value distributions showed the best performances. On the other hand, in the coldest months of the year (May to July) this performance was achieved by the LN distribution. The GUM and GEV distributions also showed good results in March and April, while the LN distribution was more adequate to model the maximum temperature data in February. The only month in which it was not possible to indicate a distribution with a better performance, to the detriment of the other distributions under analysis, was the month of January.

Tables 6, A.5 and A.10 (SM) show the probabilities of the occurrence of maximum temperatures higher than 28, 29, 30, 31 and 32 °C, for all months of the year (SM). Again, the period of the occurrence of higher maximum temperatures of the biomes is evidenced. The highest probabilities of the occurrence of maximum temperatures are observed between August and December, in comparison with the other months of the year, already considering the occurrence of maximum temperatures of above 28 °C.

Figures 5, A.2 and A.4 (SM) show the maximum temperature expected for MS biomes, considering the return times of 10 to 100 years. The largest expected maximum temperature can be observed from August to December. According to the results for January, obtained using the GEV distribution, it is expected that in an average time of 100 years, a maximum temperature greater than or equal to 29.78ºC will occur at least on one day a month, and according to GUM it will be the temperature greater than or equal to 31.15ºC. For the GUM distribution, what can be expected in each month is the highest temperature for the return time of 100 years.

Fig. 5
figure 5

The maximum air temperature (°C) expected in the Cerrado, for the return times of 10, 20, 30, 40, 50 and 100 years

The results of the GUM and GEV distributions show that the variation in the maximum temperature between the return times of 10 and 100 years suggests an increase in maximum temperature levels, varying from 0.89ºC in September (GEV) to 3.56ºC in December (GEV). In August, October and November an increase in the maximum temperature between these values is expected (Fig. 5).

The GUM distribution is a model that generally provides the highest levels of return (Fig. 5). In January–March, May, and June–October, the highest temperature levels were predicted by this distribution. In April, November and December this performance was achieved by the GEV distribution. Regarding the lowest maximum temperature levels expected, the GEV distribution showed this result in 7 months and the LN distribution in 5 months of the study. The LN distribution proved to be a more conservative model in relation to the lowest expected maximum temperature levels, compared to the GUM distribution. The LN distribution showed the best performance in February.

4 Discussion

Many studies suggest that it is virtually impossible to effectively forecast record high, extreme air temperatures. It is only possible to analyze the probability and frequency of such events, which was confirmed in our studies. According to Hyndman and Fan (2010), the frequency of the occurrence of the hottest temperature is an extreme event, and the best way of modeling is by making use of EVT. In the case of GEV methods, the observations close to the central value are omitted and only extreme values are used to estimate the parameters of theoretical distributions (Gençay and Selçuk 2004), which increases the efficiency of the method.

In the biomes of Mato Grosso do Sul, we can observe the effects of the seasonality of the air temperature cycle, which manifest themselves in the occurrence of phases of extremely high values. A measurable index of seasonality indicates changes in the thermal conditions of the biomes, which may be due to various reasons (Ummenhofer and Meehl 2017). Usually, together with climatic factors, anthropogenic factors connected with the transformation and improper utilization of the natural environment have an incidental impact. The effects of the seasonal occurrence of droughts and fire foci constitute a potential threat for the thermal stability of ecosystems (Marengo et al. 2016).

In the case of the biomes in Mato Grosso do Sul, the changes in the distribution of extreme temperatures can occur due to a shift in the mean, shifts in the variability of the distribution, as well as changes in its symmetry or skewness (toward the hotter part of the distribution). Increased kurtosis (compared to the normal distribution) results in a greater probability of extreme observations, and thus it could be causing new temperature extremes. This situation mainly concerns the Pantanal biome, which is very vulnerable to temperature changes. The Pantanal is characterized by frequent periods of drought, enhanced by fires (Silvério et al. 2013). The increase in extremely high temperatures can contribute to the degradation of one of the largest wetland ecosystems in the world.

The probability of exceeding the temperature of 28 °C is 0.5–0.6 for the beginning months of the year, while the probability of 0.9 is for the September–November period (Cerrado). For these months, a high probability of the occurrence and exceedance of subsequent temperature records was obtained, including the temperature > 32 °C. In September and October, the probability of exceeding 31 °C is almost 0.22–0.29, and the 32 °C is reduced, but is still high in October for the GUM distribution, where it is 0.09–0.1. For the first half of the year, there is practically no risk (or it is literally minimal) of exceeding the highest temperature level, i.e. 32 °C, but there is a little risk of exceeding the 30 °C and 31 °C thresholds. To determine the average number of years after which the level of the current record is exceeded, use was made of the concept of the extreme return level. In the time series analyzed, the development of new records (for both 10-year and 100-year periods) may be slow; however, it cannot be ruled out that a new record will appear soon. Moreover, in such ranges new records may show a tendency to group together, i.e. appear in series (Shrivastava et al., 2011).

The studies have shown so far that the climate in the biomes is not isolated and is subject to global climate changes. What is important here is the occurrence of the El Niño – South Oscillation climate variability model (ENSO) (Souza and Cavalcanti 2009; Rodrigues et al. 2011; Kayano et al. 2013). The climatic component influences the interannual variability of the air temperature and rainfall in the various states in Brazil (Almeida et al. 2016; Silva Junior et al. 2018; Filho et al. 2019). The regional study in the Midwest of the country showed that ENSO had a noticeable influence on dynamics of the meteorological systems (de Oliveira-Júnior et al. 2020). According to Santos (2014) and Viganó et al. (2018), the meteorological factors, such as solar radiation, the relative humidity of the air and the air temperature, have developed an important relationship in the impact zones of the biomes.

The results of the models associated with temperature extremes and severe heat waves showed that future heat waves in many regions of the world, also in Brazil (Vincent et al. 2005; Marengo et al. 2016), would become more intense, more frequent, and lasting longer in the second half of the twenty-first century. In the southern Amazon basin, it is predicted that the forest will recede due to climate change (Hutyra et al. 2005) and land use practices (Nepstad et al. 2008). The biomes are undergoing a deforestation and urbanization process (Barros et al. 2019), and deforested regions (that include cities) have an even higher temperature, being able to register up to 5 °C more than nearby regions with forests. Some recent studies (Roesch et al. 2009; Scarano and Ceotto 2015) have concluded that in a rainy season there is little difference in the temperature between deforested regions and forests, but in the dry season the difference can reach several degrees Celsius. Our research confirmed that the Generalized Distribution of Extreme Values (GEV) and Gumbel (GUM) distributions are recommended to be used in the warmer months, whereas in the coldest months the Log-Normal (LN) distribution gave a better fit to a series of extreme air temperatures.

In “normal” years, without extreme or prolonged drought, the vegetation of the biomes works as a small sink for carbon dioxide (CO2), and compensates for CO2 emissions from deforestation and burning in the region (Malhi 2012). The largest stocks of carbon and nitrogen in the soil were found in the Atlantic Forest, followed by the Amazon and the Cerrado. As for above-ground carbon and nitrogen stocks, the Atlantic Forest and, especially, the Amazon stand out as the biomes with the largest stocks. Interestingly, only in the Amazon and the Pantanal are carbon and nitrogen stocks higher in above-ground biomass than in soil stocks, diverging from other biomes in which the largest stocks are effectively concentrated in soils. Nitrogen transfer is significantly higher in the Amazon and Atlantic Forest systems compared to herbaceous-shrubby systems such as the Cerrado. Despite large differences in soil carbon stocks, variations in CO2 fluxes to the atmosphere were not high between biomes. In the case of biological nitrogen fixation (FBN), the largest inputs are associated with the Atlantic Forest forest systems, followed by the Cerrado and finally the Pantanal. As for the atmospheric nitrogen deposition, the values were similar between the biomes. However, when major droughts occur, the biomes can temporarily become a source of CO2 emissions into the atmosphere. In addition, by producing and accumulating a lot of combustible material, droughts contribute to forest fires in areas previously not subject to this phenomenon, emitting more CO2 and will contribute to other fires in the years to come (Malhi 2012).

The combination of global climate changes and dramatic changes in land cover, with large-scale deforestation, can determine changes in the local climate regime and, consequently, in the structure and composition of native vegetation. The “savannization” process emerged as an important warning to a possible structural change in the region's vegetation cover. According to Silvério et al. (2013), after episodes of intense and frequent fires that exceed forest resilience, restoration may be a long-term process. Therefore, the knowledge about extreme thermal conditions is essential to understand the relationship of climate change (warming) and human activities (deforestation, fire) with the environment of the Cerrado, Pantanal and Atlantic Forest biomes. The biomes, largely functioning as wetlands, are particularly sensitive ecosystems, dependent primarily on water conditions and the air temperature. Although there are differences in the responses of different types of wetlands to climate change, the overall trend is clear. Global warming causes an increase in temperature and increases evaporation, and finally leads to the drainage of wetlands.

However, it should be noted that a profound change in the structure and functioning of ecosystems would lead to significant losses in carbon stocks in both soil and vegetation. In addition to carbon losses, there would be other physiological and phenological changes in the Cerrado. Such changes would be reflected not only in the carbon cycle, but also in the nitrogen cycle. The Atlantic Forest stores appreciable amounts of carbon and nitrogen in its soils, mainly at higher altitudes. Predicted increases in air temperature in central-western Brazil would lead to an increase in respiration and decomposition processes, generating an increase in carbon and nitrogen losses to the atmosphere.

In areas where the duration of the drought was longer, there could, in theory, be an increase in the incidence of fire, which in turn would favor the appearance of herbaceous vegetation, implying important changes in the functioning of the biomes (especially Cerrado), related to a potential decline in productivity in the face of projected climate changes (IPCC 2012).

5 Conclusions

The statistical distribution properties of the air temperature are of particular importance for assessing the structure and durability of the biomes. They provide information concerning the maintenance or disruption of an ecosystem's thermal stability.

1. We estimated the parameters of GEV, GUM and LN distributions for extreme air temperatures in the Cerrado, Pantanal and Atlantic Forest biomes of the state of Mato Grosso do Sul in Brazil. The distributions have been satisfactorily matched with monthly data and can be used to provide extreme levels of maximum temperatures.Next, we calculated the probabilities of occurrence of maximum monthly temperatures of the year for those over 28, 29, 30, 31 and 32 °C. Temperature estimates for each month and for the 2-, 5-, 10-, 30-, 50- and 100-year return periods showed that temperatures are increasing over time. The factors that modify the air temperature distribution and generate extreme values within the range of biomes include the type and resistance of the biomes to climatic factors, as well as an increase in deforestation and burning and the related extensive fires and aerosol emissions. Significant and permanent changes in the biomes are also caused by various forms of anthropogenic activity.

2. The AICc, BIC, RMSE and the coefficient R2 were used to identify the distribution that gave the best results for each month and each biome. The GUM distribution is the one with the highest return values and it is recommended to use the GUM and GEV distributions in the warmer months for the biomes in the state of Mato Grosso do Sul, with the exception of February, when the LN distribution showed the best performance. On the other hand, to model the maximum temperature data in the coldest months of the year in biomes (May to July), we recommend using the LN distribution.

3. Our results can be used in the interpretation of the influence of the air temperature on the formation of fires and in the interpretation of biological and biogeochemical processes taking place in biomes in warm months. The statistical methods we applied may be useful for determining thermal tolerance thresholds and assessing the risk of exceeding maximum values critical for the existence of ecosystems. Understanding the characteristics of climate extremes at regional and local levels is critical not only for the development of preparedness and early warning systems. This issue is also fundamental for the development of a strategy for the adaptation to climate change together with measures alleviating the effects of extreme air temperatures.