Introduction

Rainfall is considered as the main source of domestic water for living as well as for agriculture in the Uttarakhand State. The state has varied topography, leading to great variation in rainfall spatially and temporally. An analysis of the rainfall series of the state would enhance the management of water resources as well as its optimum use. One of the challenging tasks with rainfall data is to deal with interpreting past records of rainfall events in terms of future probabilities of occurrence. Therefore, the understanding of the rainfall distribution that causes flood might play an important role for the sustainable development and conservation of natural resources of the state. Estimating a statistical distribution which gives a better fit to annual rainfall has long been a concerning topic for hydrologists, meteorologists, and other water resource personnel. The understanding of rainfall distribution is important for stochastic modeling, rainfall frequency analysis, and rainfall trend analysis. The aim of such a study is not so much to explore the properties of rainfall, but to use the rainfall sequences as inputs for another modeling to understand the hydrologic processes (Buishand 1978). Distribution fitting is a way of selecting a statistical distribution that best fits available records. It is also possible to calculate return periods using various probability distributions (Upadhaya and Singh 1998).

In Costa Rica, normal distribution was the best-fitted probability distribution for annual rainfall (Waylen et al. 1996). According to Abdullah and Al-Mazroui (1998), the gamma distribution was the best for annual rainfall of Saudi Arabia. Gamma distribution also has been applied in Africa for monitoring drought (Husak et al. 2006). Rainfall distribution was also well fitted by log-Pearson type III distribution in Texas (Salami 2004) and (Lee 2005) in China. Naghavi and Yu (1995) had chosen the general extreme value (GEV) distribution for Louisiana and similar distribution was opted by (Pilon et al. 1991) in Ontario and (Alahmadi et al. 2014) in Saudi Arabia. Using Kappa distribution (Park and Jung 2002) generated rainfall quantile maps for Korea. Distributions by the Pearson and log-Pearson method were used in Golestan in Iran (Osati et al. 2010). Alghazali and Alawadi (2014) found that there was no suitable distribution across Iraq to describe rainfall. The annual rainfall in Sudan was best fitted by normal and gamma probability distribution (Mohamed and Ibrahim 2015). A Libyan monthly rainfall distribution was best fitted by gamma probability distribution function (Sen and Eljadid 1999). Sharma and Singh (2010) used the daily rainfall series and found that lognormal and gamma distribution was best fitted. Tao et al. (2002) propose a systematic procedure to compare the performance of different probability distributions. GEV distribution provides a good fit to the monthly rainfall data in Bangladesh (Ghosh et al. 2016). Four probability distributions: normal, log-normal, log-Pearson type III and Gumbel were applied over Pakistan to find the best-fit probability distribution of yearly rainfall recorded at 24 h (Amin et al. 2016).

The choice of a best-fit distribution also has been applied to a discharge series in the USA (Benson 1968; Vogel et al. 1993), the UK (NERC 1975), Australia (McMahon and Srikanthan 1981), and Turkey (Haktanir 1991). A review of the selection of the best distribution was given by (Curmane 1989). The rainfall runoff behavior of the Tagwai dam in Nigeria was fitted by various probability distributions and the normal distribution for the yearly daily rainfall, and log-Gumbel distribution was the most appropriate for the prediction of the yearly maximum daily runoff (Olumide et al. 2013). Phien and Ajirajah (1984) assessed the log-Pearson III distribution to the flood and maximum rainfall series. Frequency analysis of consecutive day’s rainfall in Rajasthan, India, was studied (Bhakar et al. 2006) and gamma distribution was found to be the best fit for the region and the corresponding return period was estimated using the gamma function. Sabarish et al. (2017) found that log-Pearson III distribution is a best-fit probability distribution for 1-day maximum rainfall over the southern part of India. Different probability distributions have mixed results at different locations and times (Lairenjam et al. 2016). Lognormal and Gumbel EV1 distribution were adopted for discharge in the Uttarakhand region (Kamal et al. 2016).

The various distributions are mainly applied and outcomes from it help to determine the risk, uncertainty and money loss. Tao et al. (2002) stated that various probability distributions have been developed to study the distribution of rainfall. However, the choice of a best suitable distribution is still a major challenge in hydrologic practice, since there is no general agreement as to which distribution should be used for the annual rainfall series. The selection of an appropriate distribution depends mainly on the characteristics of available rainfall data at the particular site. Hence, it is necessary to evaluate many available distributions in order to evaluate best suitable distribution that could offer true extreme rainfall. This study aimed to determine the best-fit distribution for the annual rainfall data in different districts of Uttarakhand and to evaluate their parameters.

Data and analysis

The monthly rainfall data for all districts of Uttarakhand were obtained for the period 1901–2002 from the meteorological data tool of website IWP (2015), (http://www.indiawaterportal.org/met_data/) Uttarakhand State is mainly known for two different mountainous regions, namely Kumaon and Garhwal. Most of the parts of the state is under forest cover and main rivers like Ganga and Yamuna originate from this state. All 13 districts of the state were selected for this study, which includes Almora, Bageshwar, Chamoli, Champawat, Dehradun, Haridwar, Nainital, Pauri, Pithoragarh, Rudraprayag, Tehri, Udham Singh Nagar and Uttarkashi as shown in Fig. 1. The basic characteristics such as population and geographical area of the state are given in Table 1.

Fig. 1
figure 1

Map of Uttarakhand

Table 1 Characteristics of Uttarakhand districts

From Table 2, it can be inferred that the annual rainfall of the Uttarakhand State is spatial and there is a wide variation in annual rainfall amount. The average maximum annual rainfall of 2426.77 mm occurred in Champawat, whereas the lowest average annual rainfall 406.70 mm occurred in Haridwar. The time series graph of different districts shows the positive or negative correlation. The time series shows two different peaks as shown in Fig. 2, the year 1936 and 1980, which are characterized by different statistical behaviors.

Table 2 Summary of statistics for annual rainfall (1901–2002)
Fig. 2
figure 2

Annual time series of rainfall

Methodology One of the major concerns in rainfall record is with interpreting past rainfall data in terms of future probabilities of occurrences. A large number of probability distribution methods have been applied in different regions and found to be useful for rainfall distribution. The best-fit probability distribution in the present case was evaluated using the following procedure.

Step I: Fitting the probability distribution The probability distributions, viz. chi squared, chi squared (2P), exponential, exponential (2P), gamma, gamma (3P), gen. extreme value (GEV), log-Pearson 3, Weibull, Weibull (3P), were applied to find out the best-fit probability distribution.

Chi-squared distribution The Chi-square distribution is given by:

$$f(x;n) = \frac{{\left( {\frac{x}{2}} \right)^{{\frac{n}{2} - 1}} e^{{\frac{ - x}{2}}} }}{{2\varGamma \left( {\frac{n}{2}} \right)}},$$
(1)

where the variable \(x \ge 0\) and the parameter \(n\), the number of degrees of freedom, is a positive integer.

Exponential distribution The exponential distribution is given by:

$$f(x;n) = \frac{1}{\alpha }e^{{\frac{ - x}{\alpha }}} ,$$
(2)

where the variable \(x\) as well as the parameter \(\alpha\) is a positive real quantity.

Gamma distribution The gamma distribution is given by:

$$f(x;a,b) = a(ax)^{b - 1} e^{ - ax} /\varGamma (b),$$
(3)

where the parameters a and b are positive real quantities as is the variable x.

Generalized extreme value (GEV) distribution The class of GEV distributions is very flexible with the tail shape parameter ξ (and hence the tail index defined as \(\alpha = \xi^{ - 1}\)) controlling the shape and size of the tails:

$$F_{\xi } (x) = \exp \left( { - (1 + \xi x)^{{{\raise0.7ex\hbox{${ - 1}$} \!\mathord{\left/ {\vphantom {{ - 1} \xi }}\right.\kern-0pt} \!\lower0.7ex\hbox{$\xi $}}}} } \right)\quad {\text{with}}\quad 1 + \xi x > 0,\quad \xi \ne 0.$$
(4)

The standardized GEV distribution, in the form of von Mises (1936) incorporates a location parameter \(\mu\) and a scale parameter σ, in addition to the tail shape parameter, ξ, and is given by:

$$F_{\xi ,\mu ,\sigma } (x) = \exp \left( { - \left( {1 + \xi \frac{{\left( {x - \mu } \right)}}{\sigma }} \right)^{{{\raise0.7ex\hbox{${ - 1}$} \!\mathord{\left/ {\vphantom {{ - 1} \xi }}\right.\kern-0pt} \!\lower0.7ex\hbox{$\xi $}}}} } \right)\,.$$
(5)

Log-Pearson type 3 distribution The log-Pearson 3 distribution is complicated, as it has two interacting shape parameters (Griffis and Stedinger 2007). Similar to GEV, it uses three parameters, location (µ), scale (σ) and shape (γ). A problem arises with LP3 as it has a tendency to give low upper bounds of the precipitation magnitudes, which is undesirable (Curmane 1989):

$$f(x) = \frac{{\left( {\xi - x} \right)^{\alpha - 1} e^{{ - \left( {\xi - x} \right)/\beta }} }}{{\beta^{\alpha } \varGamma (\alpha )}},$$
(6)

where

$$\xi = \mu - 2\sigma /\gamma .$$

Weibull distribution The Weibull distribution is given by:

$$f\left( {x:\eta ,\sigma } \right) = \frac{\eta }{\sigma }\left( {\frac{x}{\sigma }} \right)^{\eta - 1} e^{{ - \left( {\frac{x}{\sigma }} \right)^{\eta } }} ,$$
(7)

where the variable x and the parameters η and σ are all positive real numbers.

Step II: Testing the goodness of fit The goodness-of-fit tests, namely, Kolmogorov–Smirnov, Anderson–Darling, and Chi-square test were used at 5% significance level for the selection of the best-fit distribution. The best-fitted distribution is selected based on the minimum error produced, which is evaluated by the following techniques:

  1. (a)

    Kolmogorov–Smirnov test (K–S)

The Kolmogorov–Smirnov (K–S) test is a goodness-of-fit statistic that compares an empirical distribution function \((F_{x} )\), with a specified distribution function \((F_{y} )\). Many times test is used as an alternative to the Chi-square goodness-of-fit test. The Kolmogorov–Smirnov statistic (D) can be computed as:

$$D = \hbox{max} \left| {F_{x} (x) - F_{y} ,} \right|$$
(8)

which measures the distance between the empirical distribution function \(F_{x}\) and the specified distribution \(F_{y}\). Obviously, a large difference indicates an inconsistency between the observed data and the statistical model.

  1. (b)

    Anderson–Darling test (A–D)

The Anderson–Darling (A–D) test was introduced by Anderson and Darling to place more weight or discriminating power at the tails of the distribution. This can be important when the tails of the selected theoretical distribution are of practical significance. The test statistic (AD) is defined as:

$${\text{AD}} = - n - \frac{1}{n}\sum\limits_{i = 1}^{n} {(2i - 1)\left( {\ln (x_{i} ) + \ln (1 - (x_{(n + 1 - i)} ))} \right)} ,$$
(9)

where \(\left\{ {x_{(1)} < \cdots < x_{(n)} } \right\}\) is the ordered (from smallest to the largest element) sample of size n, and \(F(x)\) is the underlying theoretical cumulative distribution to which the sample is compared. The null hypothesis that \(\left\{ {x_{(1)} < \cdots < x_{(n)} } \right\}\) comes from the underlying distribution \(F(x)\) is rejected if AD is larger than the critical value \({\text{AD}}_{\alpha }\) at a given significance level \((\alpha )\).

  1. (c)

    Chi-square (χ 2) test

It is a technique that checks if a specific distribution of a certain observed event’s frequency in a sample is suitable for that sample or not. Using O to define “observed count” and E to define “expected count”, the Chi-square test statistic is calculated by:

$$\, \chi^{2} = \frac{{\sum {(O - E)^{2} } }}{E}.$$
(10)

The null hypothesis states that there is no significant difference between the expected and observed frequencies. The alternative hypothesis states that they are different.

Step III: Identification of best-fit probability distribution The goodness-of-fit test mentioned above was fitted to the rainfall data of the study area. The test statistic was computed and tested at (α = 0.05) level of significance. Accordingly, the ranking of different probability distributions was marked based on minimum test statistic value. The description of various probability distribution functions regarding probability density function, range and parameters are as shown in Tables 3 and 4.

Table 3 Distribution type and their parameters
Table 4 Goodness of fit summary

Results and discussion

Analysis of rainfall data

Analysis of rainfall data plays an important role for any water resource planning as well as for hydrological modeling. The mean monthly rainfall data for all districts of Uttarakhand for 102 years (1901–2002) were used for the present study. Figure 3 displays the annual rainfall behavior recorded for whole Uttarakhand for the duration 1901–2002. The average annual rainfall recorded is 1069 mm for the whole duration. During this period, the highest amount of rainfall was about 1982.15 mm in 1936, whereas the lowest amount of rainfall recorded was about 559.98 mm during 1987. The dark line in the figure represents the annual average rainfall. If the annual rainfall in a year departs from the average annual rainfall by greater than or equal to 25%, then it is declared as a drought (meteorological drought) year (Subramanya 2008). On the basis of 25% departure from the average annual rainfall, 30.4% times there were dry years. There was sufficient rainfall from July to September to meet evapotranspiration demand and vice versa from the October to June.

Fig. 3
figure 3

Average annual precipitation of Uttarakhand from 1901 to 2002

To identify the seasonal rainfall distribution, the whole year was divided into three periods, namely monsoon (June–September), post-monsoon (October–February) and pre-monsoon (March–May). Figure 4 shows the seasonal variation of the rainfall. This reveals that the area receives about 82% of the total annual rainfall during the monsoon season, 10% during the post-monsoon season, and 8% during the pre-monsoon season. It indicates that more than 82% of rainfall occurs in the monsoon season, and in the remaining 8 months the crop suffers from moisture stress. Therefore, it is necessary to predict the expected rainfall to design a water conservation system. Based on the drought criteria, the years 1903, 1918, 1941, 1944, 1965, 1974, 1979, 1987, 1991 and 2001 can be characterized as drought years for Uttarakhand State.

Fig. 4
figure 4

Seasonal variation of rainfall

Probability distribution

To understand the best distribution of monthly rainfall, data for all districts of Uttarakhand for 102 years (1901–2002) were analyzed. The probability analysis of monthly rainfall series data was carried out. A best-fit distribution, such as chi squared, chi squared (2P), exponential, exponential (2P), gamma, gamma (3P), gen. extreme value (GEV), log-Pearson 3, Weibull, Weibull (3P), was applied. Table 3 shows the distribution parameters for a different distribution. To obtain the best-fit distribution to this rainfall series, goodness-of-fit tests such as Kolmogorov–Smirnov, Anderson–Darling, and Chi squared were applied. The assessment of the best probability distribution was based on the total rank obtained from all the tests. Ranks ranging from zero to ten (0–10) are given to each distribution model based on the criteria that the distribution(s) with the highest total score is or are chosen as the best distribution model(s) for the data of a particular district. According to the goodness-of-fit test, it was found that Weibull distribution best fitted the rainfall distribution for Almora, Bageshwar, Nainital and Udham Singh Nagar districts, Chi-squared (2P) distribution best fit for Chamoli, Champawat and Haridwar, Gamma (3P) distribution best fitted Dehradun and Pauri Garhwal, log-Pearson 3 best fitted Pithoragarh and Tehri Garhwal and Weibull (3P) distribution best fitted Uttarkashi. The general extreme value distribution was best fitted during the monsoon (June–September). This best distribution is used to define the risk and uncertainty associated with modeling and planning of water resources. It also allows us to improve valid models which could protect us from time and economy loss.

Conclusion

The Uttarakhand State is facing the problem of quick translation of rainfall to surface runoff because of slope and faces the problems of landslides. Thus to cope up with these issues, an organized calculation of probability distribution to understand and selection of the best-fit probability distribution on an annual series of rainfall data for a period of 1991–2002 of 13 districts of Uttarakhand was made. The choice of best probability distribution could also be used to influence decisions relating to local economics and hydrologic safety systems. Annual rainfall series of all districts of the state were fitted by Chi-squared, Chi-squared (2P), exponential, exponential (2P), gamma, gamma (3P), gen. extreme value (GEV), log-Pearson 3, Weibull, Weibull (3P) distributions and comparisons of best distributions were done based on the use of goodness-of-fit tests such as Kolmogorov–Smirnov, Anderson–Darling, and Chi squared.

The goodness-of-fit test analysis indicated that Weibull, Weibull (3P), Chi-squared (2P), gamma, and log-Pearson (3P) distributions were suitable for 31, 15, 24, 15, and 15% of the stations, respectively. This study could provide a basis for choosing the best probability distribution for individual districts and corresponding distribution parameters. Further to this seasonal rainfall distribution, it reveals that the area receives about 82% of the total annual rainfall during the monsoon season, 10% during the post-monsoon season, and 8% during the pre-monsoon season. This preliminary result will help the water resource planner in hydrological modeling and the policy maker to frame general guidelines for the best use of rainfall for Uttarakhand.