Statistical analysis of extreme temperatures in India in the period 1951–2020

Extreme temperatures are directly related to the occurrence of atmospheric extreme events, such as draughts, wildfires, and pollution level increases in urban areas. Policy makers, as well as society, can address such phenomenon by developing and applying methods which estimate and anticipate maximum temperature occurrences. In this research, we aim to develop a spatiotemporal model which analyzes maximum temperature trends values in the Indian 543 microregions between 1951 and 2020. In 27% of those, a maximum temperature above 45∘C was observed, at least in a year. Our analysis indicates further that 80% microregions have maximum temperatures above above 40∘C. Additionally, the results unveiled that East, Southwest, and Northwest microregions were the ones where the maximum temperatures had a higher increase with 2∘C being the average. The model developed is based on a Generalized Extreme Value (GEV) methodology, to estimate the maximum temperature values from 20 and 50 years. The projection for 20 years showed that in 15.83% of those microregions, at least one occurrence of a maximum temperature above 45∘C would occur; while in 50 years, it would happen in 21.54% of the microregions analyzed.


Introduction
There is an increasing number of studies and debates focusing on global warming, trying to unveil its causes and the possible solutions. The rise of the temperatures on Earth, caused by the greenhouse effect (a natural process responsible for the maintenance of globe's heat), is one of its causes, intensified by human's production of greenhouse gases. Considering this situation, the International Panel for Climate Change (IPCC) stated that the twentieth century was the hottest, with temperatures rising 0.7 • C on planet Earth, and predicting its continuous growth throughout the twenty-first century, if large-scale actions to contain this event are not adopted.
Temperature increases has catastrophic consequences, among which, the increase of the living costs, the effects on the economy, the extinction of plant and animal Manuela Oliveira mmo@uevora.pt Extended author information available on the last page of the article. species, and, above all, the risks on human health. Those effects have been studied for different parts of the globe, like Africa (Mugambiwa and Rukema 2019) and the Middle East and North Africa (Driouech et al. 2020). Nevertheless, according to Nagaveni and Anand (2017), one of the main geographical areas to feel the impacts of climate change is southern Asia, mainly India, since "the country is depleting natural resources, thus destroying its environment, mainly due to urbanization, industrialization and economic growth". To prevent depleting natural resources, India is going through a negative socio-economic and environmental change. "Water and air quality worsen day by day due to the increase of various pollutants in the atmosphere. Moreover, the sectors subject to the greatest exposure to climate change are the country's coastal ecosystems, biodiversity and agricultural productivity".
According to the Joint Research Institute on Global Change, in 2009, India had lots of water resources to be used in sanitation, agriculture, cooking, and for drinking. Nevertheless, from one year to the other, India decreases 34% of its available water, making over-exploitation a concern since it might also be affected by climate change. Additionally, the increase on the temperature and its bigger variability within the seasons can lead to a faster melt of Himalayas' glaciers, having as its consequence floods when the glacial lakes surpass their natural limits. In the future, these changes may include less water resources, impacting in the agriculture, urban water supply, and hydropower. 1 Taking into consideration the context presented, students from all parts of the globe have been looking for and developing methods to predict the significant increases in temperature and the places where that phenomenon is more likely to take place. Among the methods researched, the Community Earth System Model low-warming simulations was used to analyze temperature increase in Africa (Mugambiwa and Rukema 2019), and the Regional Climate Model ALADIN-Climate to predict it for the Middle East and North African countries (Driouech et al. 2020). Nevertheless, Extreme Value Theory (EVT) stressed out due to being able to space-time model and predict extreme events, allowing to identify in which regions extreme temperatures may occur and to predict it, so public policies and solutions can be applied.
In fact, EVT is a powerful branch of Statistics and Probability which focus on distribution tails (upper or lower), enabling to quantify events considered extreme or rare through the analysis of minimum or maximum sample groups. It has been studied and used for a long time, applied to different research areas for observing infrequent eventsfrom management risks (McNeil and Zentrum 1999) to performance analysis in emerging markets (Gençay and Selçuk 2004), as well as to analyze the changes in the temperature in Argentina (Rusticucci and Tencer 2008).
In this study, we aim to apply EVT to model extreme temperature occurrences in the 543 geopolitical Indian microregions, to analyze and identify which of them are more susceptible to this event, as well as to generate different levels of return.
We used the data from the India Meteorological Department (IMD) for the Indian subcontinent, and gridded daily maximum temperature data at 1 degree resolution. We can consider the data used unique, since it provides continuous daily information on the maximum and minimum temperatures from 1951 to 2020, with equal spatial intervals, taken from conventional and remote sensing platforms. This data has been authenticated by several researchers who analyzed and evaluated India's temperature extremes (Attri and Rathore 2003;Dash and Mamgain 2011;Dube and Rao 2005;Kjellstrom et al. 2009;Pai et al. 2004;Pingale et al. 2013;Satyanarayana and Rao 2020;Seneviratne et al. 2012;Simmons et al. 2010). In fact, high temperatures, droughts, and floods are dangerous events which can occur frequently due to climatic changes. Research on climate change and, more specifically, about the changes on high temperatures are significant ways to manage the environmental resources on a sustainable way. Thus, it is important to have a complete knowledge on temperature's pattern on a changing environment, helping decision-making and the communities to adapt to extreme climate events.
This paper is divided in five sections (with the first being the introduction and the fifth the conclusion). Section 2 discusses the EVT, the asymptotic distributions of maximum and the generalized distribution of extreme values. Section 3 aims to describe this study's methodologythe statistics used, thematic maps construction, prediction of the Generalized Extreme Value (GEV) parameters through the maximum likelihood method, test for quality of fit and return levels. Section 4 presents the main results. That justifies why there are several publications regarding it, in different areas: to analyze long and short-term strategies in the Brazilian market (Monte-mor et al. 2014) or to study selling stocks in emerging markets (Gençay and Selçuk 2004) in Economy; or even to predict maximum temperatures in Colombia (Núnez-Galeano and Giraldo-Osorio 2016). Despite these studies being relatively recent, and according to Gumbel (1958), the interest in developing extreme event predictive models go back to the seventeenth century, being related to astronomy studies. Fisher and Tippett (1928) exposed the first fundamentals in 1928, introducing three possible types of asymptotic distributions of extreme values, known as Gumbel, Fréchet, and Weibull distributions. Although those first steps nowadays, it was (Gumbel 1958) who published the first study that formalizes the statistical application of these distributions, with his methodology being often applied. We also must consider (Gnedenko 1943) contribution, for showing the necessary and sufficient conditions for the existence of asymptotic distributions of extreme values.

Main limit results
Consider a sample (X 1 , X 2 , . . . , X n ) of independent and identically distributed (iid) arbitrary variables from a main population which distribution function (df) F is unknown.
Considering that M n = max X 1 ≤i≤X n , (X i ) is the maximum of the sample, and assuming that there are normalizing constants a n > 0, b n R and that there is non-degenerate G, such that, for all x, If we choose the proper normalizing constants, G is considered the Generalized Extreme Value (GEV) distribution, given here in the von Mises-Jenkinson (Monte-mor et al. 2014;Coles 2001). By introducing the shape parameter ξ : Weibull (ξ < 0), Gumbel (ξ = 0), and Fréchet (ξ > 0), the GEV model enables to unify the three possible limit max-stables distributions. The shape parameter ξ is linked to the weight of the right-tail and frequently denominated extreme value index (EVI). We can also introduce two more parameters, λ related to location, and δ related to scale. Considering our case study and the unknown character of the normalizing constants a n > 0 and b n ∈ R, we integrate  them in the GEV distribution related to location and scale parameters (λ and δ,) taking us to the model For parameters (ξ ; λ; δ) estimation, maximum likelihood estimation (MLE) method was used because of its adaptability to changes in model structure (Coles 2001). We use the standard parameters of the gev.fit function from the ismev package for the maximum-likelihood fitting optimization method, which uses the Nelder-Mead algorithm and a maximum of 1000 iterations (see Heffernan and Stephenson (2018) for more details).
Histogram, a Quantile Plot, a Return Level Plot, or a Probability Plot can be used to check the model, with block maxima, the largest observations and the peaksover-threshold being the most important methods used by the Statistics of Univariate Extremes domain area (Coles 2001). 2 Return levels (R p ) and return period (p) estimate gave probabilistic inference about upcoming threat. For stationary model, the return level associated with 1/p return period is given by where R p is the return level or event intensity associated with the return period; p, μ, σ are known as location and scale parameters, respectively; and ξ shape of distribution (Coles 2001).
2 For an EVT overview and related topics, see Coles (2001).

Data and study area
India is geographically located between latitude 8.4 and 37.6 N and longitude 68.7 and 97.25 • E. Considering its climatic characteristics, the distribution and occurrence pattern of temperature, for regions-North-West India (NW India), North-East India (NE India), Central India, and Peninsular India (Fig. 1).
To pursue this study, we used the maximum annual temperatures observed in 702 sites in India, between 1951 and 2020. The goal is to identify the microregions which are considered susceptible to extreme temperatures, to recognize the authorities responsible for those places.
To establish the sites, we considered the smaller areas of India's political microregions. That enables us to reduce the generality of the temperatures. In fact, if we considered even larger regions, it would not be possible to identify the approximate or the exact location of the extreme temperatures. This option also allows an improvement on the effectiveness of preventive measures to be studied and applied, and an easier identification of the political leaders responsible for those sites.
To interpolate those places, interpolation by the nearest neighbors up to 0.5 • latitude and longitude was used, a deterministic interpolation that shows the nearest or the geometrically most convenient points in all directions. The 0.5 • distance was selected due to the fact of being half the distance from one site to the other, not allowing the algorithm to choose two sites in the same direction and applying the mean of the temperatures registered in those places. This interpolation method was used because it is easy and fast, justifying being frequently applied in sampled studies. According to Olivier and Hanqiang (2012), there is a higher performance of the algorithm nearest neighbors when compared to conventional interpolating algorithms since the interpolation creates time series for microregions. For the microregions that for some year did not result in any value after the interpolation, another nearest neighbors interpolation was performed considering the average of the temperatures of the neighboring microregions of the border. To build a suitable database, series that had less than 30 years of data were excluded, resulting in the exclusion of the microregion Andaman and Nicobar.
We presented thematic maps of descriptive statistics (minimum, maximum, median, and median absolute deviation), as well as boxplot and histograms to analyse the temperature variability. We used DataMeet 3 for building India as well its States and Union Territories maps. We considered 28 States and 7 Union Territories based on the ECI The EVT modeling used annual data of maximum temperature in the different Indian microregions. The methodology adopted enabled to develop thematic maps identifying the microregions susceptible to higher temperatures and their return levels.
All statistical analysis were performed using the R software, version 4.0.2. (Team 2021). We used the rgdal, (version 1.5-12 (Bivand et al. 2021)) and ggplot2, (version 3.3.2 (Wickham 2016)) packages from the R library. In particular, the rgdal package was used for reading the map of India, while ggplot2 was used for the plots. The ismev package (version 1.42 (Heffernan and Stephenson 2018)) was used for data analysis, as it has a specific function to analyse extreme values.

Results
India integrates 543 microregions. In the last year of this study, 2020, about 58% of them registered, at least for one day of the year, a temperature above 40 • C, and 6% registered an even higher temperature, above 45 • C. Figure 3 shows the histogram and the boxplot of the maximum temperatures in 2020. Figure 4 shows the maximum temperatures in 2020 according to the microregions. A large part of India, mainly central India, faced maximum temperatures above 43 • C between 1951 and 2020. Oppositely, the lowest annual temperatures, not exceeding 35 • C, were registered in Eastern and Southwestern regions.
Moreover over the full period from 1951 to 2020 the annual temperatures ranged from 42.5 • C in year 1954 (Shimla) to 47.5 • C in year 1995 (Banswara) (Fig. 5.) In the chronological period mentioned, the highest increase in the temperature was registered in East, Southwest, and Northwest regions, as we can see in Fig. 6, which represents annual maximum temperature's amplitude. Despite a significant increase in most of the country, there are several microregions in the central region which face a temperature decrease, and others where there was just a small variation on the temperature. Figure 7 demonstrates the median, median absolute deviation, minimum, and the maximum temperatures in the 70 years analyzed, showing a major concern, as in 27% of Indian's microregions there were temperatures above  45 • C and 80% above 40 • C per year. Those temperatures are not consistent with the human body resistance to high temperatures. According to Tita et al. (2009), human body resistance is within the limits of 36.1 to 37.5 • C, depending on air humidity. Considering that the maximum annual temperatures observed in central India already exceed 40 • C, that could represent a great risk of survival.
Those temperatures are not consistent with the human body resistance to high temperatures. According to Tita et al. (2009) human body resistance is within the limits of 36.1 to 37.5 • C, depending on air humidity. Considering that the maximum annual temperatures observed in central India already exceed 40 • C, that could represent a great risk of survival. In this study, GEV was applied to model the 543 microregions, with the parameters ξ, λ, and δ estimated with the maximum likelihood method. In Appendix, Table 1 presents the parameters estimated by the GEV model and its 95% CI,and Figs. 11 to 36 show the distribution of empirical data of maximum temperature during the period of study, with estimated densities from the GEV distribution. By testing the models with the chi-squared test in which the classes for the goodness of fit test were the quartiles 20%, 40%, 60%, and 80%. According to Murthy and Gafarian (1970) and Chernoff and Lehmann (1954), it is recommended that the degrees of freedom of the chi-square test be m − k − 1; with m the number of quantiles and k the number of estimated parameters. The results showed that p-value was greater than 0.01 for 537 models, for the 5 rejected models the graphical tests were applied, which according to Coles (2001) are more suitable, resulting in four models that the GEV did not fit well and the Tonk -Sawai Madhopur, Ajmer, Pali, Jaunpur microregions were excluded from further analyses.
The GEV prediction power was tested (except the microregions of Andaman & Nicobar, Tonk -Sawai Madhopur, Ajmer, Pali, and Jaunpur), the data were separated into two sets, data from 1951 to 2000 and from 2001 to 2020. With the data from 1951 − 2000, a GEV was modeled for each microregion and generated a forecast for the next 20 years. This value is compared with data from 2001 − 2020. Coles (2001) says that the GEV forecast for n years, in some year the observed value will exceed or be very close to the theoretical quantity 1 − 1 p . Of the 538 GEV forecasts for the 20 years (2000 to 2020), 145 were higher than the real observed value, being 27% of the microregions. Figure 8 shows the scatter plot of observed and predicted values.
The GEV prediction power was considered satisfactory, 97% of predicted values were greater than observed values or the difference between them was less than 1, with the root o mean square error (RMSE) of 0.687. Figure 9 shows the observed and predicted values on the maps. The analysis of the model's diagnosis for all the microregions hightlighted the validity of fitted models.
The GEVs distributions for two different return levels of 20 and 50 years are presented in Fig. 10. In the case of the former, the maximum value was 47.02 • C and with 95% confidence interval extending from 46.84 to 47.21 • C while the minimum value was 34.33 • C and with a 95% confidence interval extending from 34.09 to 34.56 • C. In the case of the 50 years return period, the maximum value was higher reaching 47.21 • C and with 95% confidence interval extending from 47.00 to 47.41 • C. The minimum value in this case was also higher (34.60 • C) and with a 95% confidence interval extending from 34.32 to 34.88 • C.
Nevertheless, when compared with the chronology studied, there is a temperature decrease in some microregions. Specifically, 32% of these regions had maximum temperatures higher than 45 • C in the period 1951-2020. Despite that fact, the results are still worrying since the temperature's increase represents a risk for human survival.

Conclusion
With this study, we aimed to estimate the main extreme parameters of interest according to the maximum annual temperatures in India's microregions, between 1951 and 2020. We developed a descriptive analysis materialized in thematic maps, which unveiled the worrying situation in Indian central region. Additionally, although in the East and Southwest regions, the maximum temperatures were the lowest; it was where a higher increase was observed. Twenty seven percent of the microregions face a temperature higher than 45 • C, above the limit of human body resistance as we have seen.
Applying the adjustment quality test, 538 out of 543 models were proven to be effective.The predictive power of the models was tested and found to be satisfactory, with a root mean square error of 0.687.
That enabled to develop return levels for 20 and 50 years, demonstrating that 22% of the microregions will face temperatures above 45 • C if the authorities do not apply measures to fight and prevent this phenomenon.           Manipur, Jabalpur, Jadavpur, Jagatsinghpur, Jahanabad, Jaipur, Jaipur Rural, Jajpur, Jalandhar

Fig. 29
Theoretical and empirical curve of extreme temperatures in microregions Nandyal, Narasaraopet, Narsapuram, Nashik, Navsari, Nawada, Nellore, New Delhi, Nilgiris, Nizamabad, North East Delhi, Author contribution All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Venkata Dodla and G. C. Satyanarayana. The first draft of the manuscript was written by Flavio Ferraz Vieira, Manuela Oliveira, Eugénio Garção, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding Open access funding provided by FCT-FCCN (b-on).
Availability of data and material The datasets generated during and/or analysed during the current study are available in Supplementary Material.
Code availability Not applicable.

Declarations
Ethics All authors have addressed the Ethics requirements and standards.
Consent to participate All authors consent to participate.

Competing interests
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.