Multimodel response assessment for monthly rainfall distribution in some selected Indian cities using bestfit probability as a tool
Abstract
We carry out a study of the statistical distribution of rainfall precipitation data for 20 cites in India . We have determined the bestfit probability distribution for these cities from the monthly precipitation data spanning 100 years of observations from 1901 to 2002. To fit the observed data, we considered 10 different distributions. The efficacy of the fits for these distributions was evaluated using four empirical nonparametric goodnessoffit tests, namely Kolmogorov–Smirnov, Anderson–Darling, Chisquare test, Akaike information criterion, and Bayesian information criterion. Finally, the bestfit distribution using each of these tests were reported, by combining the results from the model comparison tests . We then find that for most of the cities, generalized extreme value distribution or inverse Gaussian distribution most adequately fits the observed data .
Keywords
Rainfall statistics KS test Anderson–Darling test AIC BICIntroduction
Establishing a probability distribution that provides a good fit to the monthly average precipitation has long been a topic of interest in the fields of hydrology, meteorology, agriculture (Fisher 1925). The knowledge of precipitation at a given location is an important prerequisite for agricultural planning and management. Rainfall is the main source of precipitation. Studies of precipitation provide invaluable knowledge about rainfall statistics. For rainfed agriculture, rainfall is the single most important agrometeorological variable influencing crop production (Wallace 2000; Rockström et al. 2003). In the absence of reliable physically based seasonal forecasts, crop management decisions and planning have to rely on statistical assessment based on the analysis of historical precipitation records. It has been shown by Fisher (1925) that the statistical distribution of rainfall is more important than the total amount of rainfall for the yield of crops. Therefore, detailed statistical studies of rainfall data for a variety of countries have been carried out for more than 70 years along with fits to multiple probability distribution (Ghosh et al. 2016; Sharma and Singh 2010; Nguyen et al. 2002). We recap some of these studies for stations, both in India, as well as those outside India.
Mooley and Appa Rao (1970) first carried out a detailed statistical analysis of the rainfall distribution during southwest and northeast monsoon seasons at selected stations in India with deficient rainfall, and found that the gamma distribution provides the best fit. Stephenson et al. (1999) showed that the outliers in the rainfall distribution for the summers of 1986–1989 throughout India can be well fitted by the gamma and Weibull distributions. Deka et al. (2009) found that the logistic distribution is the optimum distribution for the annual rainfall distribution for seven districts in northeast India. Sharma and Singh (2010) found, based on daily rainfall data for Pantnagar spanning 37 years, that the lognormal and gamma distribution provide the bestfit probability distribution for the annual and monsoon months, whereas the generalized extreme value provides the best fit after considering only the weekly data. Most recently, Kumar et al. (2017) analyzed the statistical distribution of rainfall in Uttarakhand, India, and found that the Weibull distribution performed the best. However, one caveat with some of the above studies is that only a handful of distributions were considered for fitting the rainfall data, and sometimes no detailed model comparison tests were done to find the most adequate distribution.
A large number of statistical studies have similarly been done for rainfall precipitation data for stations outside India. For brevity, we only mention a few selected studies to illustrate the diversity in the bestfit distribution found from these studies. In Costa Rica, normal distribution provided the best fit to the annual rainfall distribution (Waylen et al. 1996). A generalized extreme value distribution has been used for Louisiana (Naghavi and Yu 1995). Gamma distribution provided the best fit for rainfall data in Saudi Arabia (Abdullah and AlMazroui 1998), Sudan (Mohamed and Ibrahim 2015) and Libya (Şen and Eljadid 1999). Mahdavi et al. (2010) studied the rainfall statistics for 65 stations in the Mazandaran and Golestan provinces in Iran and found that the Pearson and logPearson distribution provide the best fits to the data. Nadarajah and Choi (2007) found that Gumbel distribution provides the most reasonable fit to the data in South Korea. Ghosh et al. (2016) found that the extreme value distribution provides the best fit to the Chittagong monthly rainfall data during the rainy season, whereas for Dhaka, the gamma distribution provides a better fit.
Therefore, we can see from these whole slew of studies that no single distribution can accurately describe the rainfall distribution. The selection depends on the characteristics of available rainfall data as well as the statistical tools used for model selection.
The main objective of the current study is to complement the above studies and to determine the bestfit probability distribution for the monthly average precipitation data of 20 selected stations throughout India, using multiple goodnessoffit tests.
Datasets and methodology
Distribution  Probability density function 

Normal  \(f(x) = \frac{1}{\sqrt{2\pi {\sigma }^2}}\exp {\frac{(x\mu )^2}{2{\sigma }^2}}\) 
Lognormal  \(f(x) = \frac{1}{x\sqrt{2\pi {\sigma }^2}}\exp {\frac{(\ln x\mu )^2}{2{\sigma }^2}}\) 
Gamma  \(f(x) = \frac{1}{\theta _k}\frac{x^{k1}\exp (x/\theta )}{\Gamma (k)}\) 
Inverse Gaussian  \(f(x) = {\frac{\lambda }{2\pi x^3}}^{0.5}\exp {\frac{\lambda (x\mu )^2}{2\mu ^2x}}\) 
GEV  \(f(x) = \frac{1}{\sigma }\left[1k\frac{x\mu }{\sigma }\right]^{1/k1}{\exp \left[(1k\frac{x\mu }{\sigma })\right]}^{1/k}\) 
Gumbel  \(f(x) = {1/\beta } \exp (z+\exp (z))),z=\frac{x\mu }{\beta }\) 
Student’s t  \(f(x) = \frac{\Gamma \left(\frac{v+1}{2}\right)}{\sqrt{v\pi }\Gamma \left(v/2\right)}{\left(1+\frac{x^2}{v}\right)}^{\frac{v1}{2}}\) 
Beta  \(f(x) = \frac{x^{\alpha 1}(1x)^{\beta 1}}{\frac{\Gamma (\alpha )\Gamma (\beta )}{\Gamma (\alpha +\beta )}}\) 
Weibull  \(f(x) = \frac{k}{\lambda }\left(\frac{x}{\lambda }\right)^{k1}e^{\left(\frac{x}{\lambda }\right)^{k}} \) 
Fisher  \(\frac{\root \of {\frac{(d_{1}x)^{d_{1}}d_{2}^{d_{2}}}{(d_{1}x+d_{2})^{d_{1}+d_{2}}}}}{xB\left(\frac{d_{1}}{2},\frac{d_{2}}{2}\right)}\) 
Model comparison tests
We use multiple model comparison methods to carry out hypothesis testing and select the best distribution for the precipitation data.
For this purpose, the goodnessoffit tests used include nonparametric distributionfree tests such as Kolmogorov−Smirnov test, Anderson–Darling test, Chisquare test, and informationcriterion tests such as Akaike and Bayesian information criterion. For each of the probability distributions, we find the bestfit parameters for each of the stations using leastsquares fitting and then carry out each of these tests. We now describe these tests.
Kolmogorov–Smirnov test
The K–S test is based on the maximum distance (or supremum) between the empirical distribution function and the normal cumulative distributive function. An attractive feature of this test is that the distribution of the K–S test statistic itself does not depend on the statistics of the parent distribution from which the samples are drawn. Some limitations are that it applies only to continuous distributions and tends to be more sensitive near the center of the distribution than at the tails.
Anderson–Darling test
Chisquare test
AIC and BIC
The Akaike information criterion (AIC) (Liddle 2004; Kulkarni and Desai 2017) is a way of selecting a model from an input set of models. It can be derived by an approximate minimization of the Kullback–Leibler distance between the model and the truth. It is based on information theory, but a heuristic way to think about it is as a criterion that seeks a model, which has a good fit to the truth with very few parameters.
Results and discussion
Summary statistics of monthly precipitate data for the selected stations during the years (1901–2002). We note that all units of dimensional quantities are in mm
Min.  Max.  Mean  SD  Coeff. of variation  Coeff. of skewness  Kurtosis  

Kohima  0  802.43  196.33  177.67  0.91  0.77  − 0.24 
Jaipur  0  517.61  48.6  83.53  1.72  2.28  5.26 
Kolkata  0  892.15  132.15  148.63  1.13  1.31  1.474 
Raipur  0  635.98  105.38  140.33  1.33  1.33  0.72 
Gandhinagar  0  694.2  56.42  105.18  1.86  2.33  5.36 
Hyderabad  0  544.26  70.06  89.41  1.28  1.53  2.19 
Aizawl  0  1065.92  227.2  221.48  0.98  0.8  − 0.311 
Bhopal  0  725.72  89.53  140.91  1.57  1.73  2.18 
Ahmednagar  0  611.13  70.73  96.63  1.37  1.58  2.33 
Cuttack  0  506.19  106.32  115.32  1.09  0.91  − 0.34 
Chennai  0  768.91  96.89  118.27  1.22  1.99  4.82 
Bangalore  0  360.95  69.89  68.66  0.98  1.08  0.78 
Patna  0  534.69  90.96  121.9  1.34  1.39  0.9 
Amritsar  0  416.06  39.16  59.15  1.51  2.61  8.02 
Guntur  0  438.45  65.66  74.58  1.14  1.44  2.24 
Lucknow  0  619.08  74.85  113.6  1.52  1.76  2.43 
Kurnool  0  374.53  45.19  53.93  1.19  1.85  4.69 
Jammu  0  704.43  60.88  83.41  1.37  2.59  8.35 
Delhi  0  511.54  47.45  80.67  1.7  2.47  6.58 
Panipat  0  463.83  43.58  69.103  1.59  2.33  5.87 
Stationwise best ranked probability distribution using different goodnessoffit tests
Study location  KS  AD  Chi square  AIC(c)  BIC 

Patna  F  F  GEV  F  Beta 
Kurnool  F  F  Weibull  F  Beta 
Jaipur  F  F  Inv. Gauss  F  Beta 
Chennai  F  F  Gamma  F  Beta 
Hyderabad  F  F  Inv Gauss  F  Beta 
Lucknow  F  F  Inv. Gauss  F  Beta 
Bangalore  F  F  Weibull  F  Beta 
Kohima  Weibull  Beta  Beta  Weibull  Beta 
Aizawl  Weibull  Beta  Gamma  Weibull  Beta 
Guntur  F  F  F  F  Beta 
Panipat  F  F  GEV  F  Beta 
Amritsar  F  F  Inv. Gauss  F  Beta 
Cuttack  F  F  GEV  F  Beta 
Gandhinagar  F  F  Beta  GEV  t 
Ahmednagar  F  F  Inv. Gauss  t  Beta 
Raipur  F  F  GEV  F  Beta 
Jammu  F  F  Weibull  F  Beta 
Kolkata  F  F  F  F  Beta 
Bhopal  F  F  Inv. Gauss  F  Beta 
Delhi  F  F  Inv. Gauss  F  Beta 
Stationwise bestfit distribution obtained by summing the ranks of each of the distributions from all the model comparison tests considered in Table 3
Study location  Best fit 

Kohima  Genextreme 
Jaipur  Invgauss 
Kolkata  Genextreme 
Raipur  Genextreme 
Gandhinagar  GHenextreme 
Hyderabad  Invgauss 
Aizawl  Gamma 
Bhopal  Invgauss 
Ahmednagar  Invgauss 
Cuttack  Genextreme 
Chennai  Invgauss 
Banglore  Genextreme 
Patna  Genextreme 
Amritsar  Invgauss 
Guntur  Gumbel 
Lucknow  Invgauss 
Kurnool  Gumbel 
Jammu  Invgauss 
Delhi  Invgauss 
Panipat  Genextreme 
Parameters estimates using sample Lmoments [mean (L1), variance (L2), skewness (L3) , kurtosis (L4)] of the bestfit distributions
Study location  Bestfit  Mean (L1)  Variance (L2)  Skewness (L3)  Kurtosis (L4) 

Kohima  GEV  196.33  98.56  0.21  0.02 
Jaipur  Inv Gauss  48.6  36.27  0.57  0.27 
Kolkata  GEV  132.15  77.77  0.33  0.07 
Raipur  GEV  105.38  70.01  0.43  0.09 
Gandhinagar  GEV  56.42  44.44  0.61  0.29 
Hyderabad  Inv Gauss  70.06  45.1  0.4  0.1 
Aizawl  Gamma  227.2  121.87  0.24  0.01 
Bhopal  Inv Gauss  89.53  65.42  0.53  0.18 
Ahmednagar  Inv Gauss  70.73  47.78  0.43  0.11 
Cuttack  GEV  106.32  62.07  0.3  0.01 
Chennai  Inv Gauss  96.89  58.01  0.39  0.17 
Bangalore  GEV  69.89  37.14  0.26  0.07 
Patna  GEV  90.96  60.58  0.44  0.11 
Amritsar  Inv Gauss  39.16  26.01  0.51  0.27 
Guntur  Gumbel  65.66  38.73  0.33  0.08 
Lucknow  Inv Gauss  74.85  53.11  0.51  0.18 
Kurnool  Gumbel  45.19  27.06  0.36  0.13 
Jammu  Inv Gauss  60.88  37.41  0.48  0.26 
Delhi  Inv Gauss  47.45  34.68  0.57  0.28 
Panipat  GEV  43.58  30.62  0.54  0.25 

Using K–S test (D), we find that the Fisher distribution provides a good fit to the monthly precipitation data for all cities except Kohima and Aizawl. For these cities, Weibull distribution provides the best fit.

Using Anderson–Darling test (\(A^{2}\)), it is observed that the Fisher distribution is the best fit for all the cities except (again) for Kohima and Aizawl, for which the beta distribution gives the best fit for both the cities.

Using Chisquare test (\(\chi ^{2}\)), there is no one distribution which consistently provides the best fit for most of the cities. Inverse Gaussian is the optimum fit for seven cities, whereas Weibull and generalized extreme for three cities, beta and Fisher for two cities each. The locations of the corresponding cities can be found in Table 3.

Using AICc, it is observed that the Fisher distribution provides best distribution for about 16 cities. The exceptions are again Kohima and Aizawl, for which Weibull is the most appropriate distribution. Generalized extreme value distribution provides the best fit for Gandhinagar, whereas Student’s tdistribution provides the best fit for Ahmednagar.

For BIC, we find that the beta distribution provides best distribution for all districts except Gandhinagar. Student’s tdistribution provides best fit for Gandhinagar.
Implementation
We have used the python v2.7 environment. In addition, Numpy, pandas, matplotlib, scipy packages are used. Our codes to reproduce all these results can be found in http://goo.gl/hjYn1S. These can be easily applied to statistical studies of rainfall distribution for any other station.
Comparison to previous results
A summary of some of the previous studies of rainfall distribution for various stations in India is outlined in the introductory section. An applestoapples comparison to these results is not straightforward, since they have not used the same model comparison techniques or considered all the 10 distributions which we have used. Moreover, the dataset and duration they have used is also different. Nevertheless, we compare and contrast the salient features of our conclusions with the previous results.
Among the previous studies, Sharma and Singh (2010) have also found that generalized extreme value distribution fits the weekly rainfall data for Pantnagar. We also find that this distribution provides the best fit for eight cities. The bestfit distribution which we found for Aizawl agrees with the results from Mooley and Appa Rao (1970), Kulandaivelu (1984), Bhakar et al. (2006). None of the previous studies have found the inverse Gaussian or the Gumbel distribution to be an adequate fit to the rainfall data. However, this could be because these two distributions were not fitted to the observed data in any of the previous studies. Inverse Gaussian and the Gumbel distribution have only recently been considered by Ghosh et al. (2016) and Nadarajah and Choi (2007) for fitting the rainfall data in Bangladesh and Korea, respectively. We hope our results spur future studies to consider these distributions for fitting rainfall data in India.
Conclusions
We carried out a systematic study to identify the bestfit probability distribution for the monthly precipitation data at twenty selected stations distributed uniformly throughout India. The data showed that the monthly minimum and maximum precipitation at any time at any station ranged from 0 to 802 mm, which obviously indicates a large dynamic range. So identifying the best parametric distribution for the monthly precipitation data could have a wide range of applications in agriculture, hydrology, engineering design and climate research.
For each station, we fit the precipitation data to 10 distributions as described in Table 1. To determine the best fit among these distributions, we used five model comparison tests, such as K–S test, Anderson–Darling test, Chisquare test, Akaike and Bayesian information criterion. The results from these tests are summarized in Table 3. For each model comparison test, we ranked each distribution according to its p value and then added the ranks from all the four tests. The bestfit distribution for each city is the one with the minimum total rank and is tabulated in Table 4. We find that no one distribution can adequately describe the rainfall data for all the stations. For about nine cities, the inverse Gaussian distribution provides the best fit, whereas generalized extreme value can adequately fit the rainfall distribution for about eight cities. Our study is the first one, which finds the inverse Gaussian distribution to be the optimum fit for any station. Among the remaining cities, Gumbel and gamma distributions are the best fit for two and one city, respectively.
In the hope that this work would be of interest to researchers wanting to do similar analysis and to promote transparency in data analysis, we have made our analysis codes as well as data publicly available for anyone to reproduce this results as well as to do similar analysis on other rainfall datasets. This can be found at http://goo.gl/hjYn1S
Notes
References
 Abdullah M, AlMazroui M (1998) Climatological study of the southwestern region of Saudi Arabia. i. Rainfall analysis. Clim Res 9:213–223CrossRefGoogle Scholar
 Bhakar S, Bansal AK, Chhajed N, Purohit R (2006) Frequency analysis of consecutive days maximum rainfall at Banswara, Rajasthan, India. ARPN J Eng Appl Sci 1(3):64–67Google Scholar
 Cochran WG (1952) The \(\chi \)2 test of goodness of fit. Ann Math Stat 23:315–345CrossRefGoogle Scholar
 Deka S, Borah M, Kakaty S (2009) Distributions of annual maximum rainfall series of NorthEast India. Eur Water 27(28):3–14Google Scholar
 Fisher R (1925) The influence of rainfall on the yield of wheat at rothamsted. Philos Trans R Soc Lond B Biol Scie 213(402–410):89–142CrossRefGoogle Scholar
 Ghosh S, Roy MK, Biswas SC (2016) Determination of the best fit probability distribution for monthly rainfall data in Bangladesh. Am J Math Stat 6(4):170–174Google Scholar
 Hosking JR (1990) Lmoments: analysis and estimation of distributions using linear combinations of order statistics. J R Stat Soc Ser B (Methodological) 52:105–124Google Scholar
 Kahle D, Wickham H (2013) ggmap: spatial visualization with ggplot2. R J 5(1):144–161Google Scholar
 Kulandaivelu R (1984) Probability analysis of rainfall and evolving cropping system for coimbatore. Mausam 35(3):257Google Scholar
 Kulkarni S, Desai S (2017) Classification of gammaray burst durations using robust modelcomparison techniques. Astrophys Space Sci 362(4):70CrossRefGoogle Scholar
 Kumar V (2017) Statistical distribution of rainfall in Uttarakhand, India. Appl Water Sci 7:1–12CrossRefGoogle Scholar
 Liddle AR (2004) How many cosmological parameters. Mon Not R Astron Soc 351(3):L49–L53CrossRefGoogle Scholar
 Mahdavi M, Osati K, Sadeghi SAN, Karimi B, Mobaraki J (2010) Determining suitable probability distribution models for annual precipitation data (a case study of mazandaran and golestan provinces). J Sustain Dev 3(1):159CrossRefGoogle Scholar
 Mohamed TM, Ibrahim AAA (2015) Fitting probability distributions of annual rainfall in Sudan. J Sci Technol 17(2)Google Scholar
 Mooley D, Appa Rao G (1970) Statistical distribution of pentad rainfall over india during monsoon season. Indian J Meteorol Geophys 21:219–230Google Scholar
 Nadarajah S, Choi D (2007) Maximum daily rainfall in South Korea. J Earth Syst Sci 116(4):311–320CrossRefGoogle Scholar
 Naghavi B, Yu FX (1995) Regional frequency analysis of extreme precipitation in louisiana. J Hydraul Eng 121(11):819–827CrossRefGoogle Scholar
 Nguyen VTV, Tao D, Bourque A (2002) On selection of probability distributions for representing annual extreme rainfall series. In: Global solutions for urban drainage, pp 1–10Google Scholar
 Rockström J, Barron J, Fox P (2003) Water productivity in rainfed agriculture: challenges and opportunities for smallholder farmers in droughtprone tropical agroecosystems. Water Prod Agric Limits Oppor Improv 85199(669):8Google Scholar
 Şen Z, Eljadid AG (1999) Rainfall distribution function for libya and rainfall prediction. Hydrol Sci J 44(5):665–680CrossRefGoogle Scholar
 Sharma MA, Singh JB (2010) Use of probability distribution in rainfall analysis. NY Sci J 3(9):40–49Google Scholar
 Stephens MA (1974) Edf statistics for goodness of fit and some comparisons. J Am Stat Assoc 69(347):730–737CrossRefGoogle Scholar
 Stephenson D, Kumar KR, DoblasReyes F, Royer J, Chauvin F, Pezzulli S (1999) Extreme daily rainfall events and their impact on ensemble forecasts of the Indian monsoon. Mon Weather Rev 127(9):1954–1966CrossRefGoogle Scholar
 VanderPlas J, Connolly AJ, Ivezić Ž, Gray A (2012) Introduction to astroml: machine learning for astrophysics. In: 2012 Conference on intelligent data understanding (CIDU). IEEE, pp 47–54Google Scholar
 Wallace J (2000) Increasing agricultural water use efficiency to meet future food production. Agric Ecosyst Environ 82(1–3):105–119CrossRefGoogle Scholar
 Waylen PR, Quesada ME, Caviedes CN (1996) Temporal and spatial variability of annual precipitation in Costa Rica and the Southern oscillation. Int J Climatol 16(2):173–193CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.