# Entropy as a Variation of Information for Testing the Goodness of Fit

- 152 Downloads
- 1 Citations

## Abstract

Increasing population, higher levels of human and industrial activities have affected water resources in the last decades. In addition, per capita demand for water in most countries is steadily increasing as more and more people achieve higher standards of living. Researchers need more information about water resources for their efficient use and effective management. In this respect, getting sufficient, accurate and quick information has great significance in water resources planning and management in parallel to the determination of the characteristics of water resources. To this end, useful and easily applicable methods have been explored to get optimum results and several test techniques have been investigated to get much more information based on available water resources data. In the presented study, Informational Entropy method is introduced as an alternative test method to test the goodness of fit of probability functions. The presented study gives a brief detail on the applicability of the concept as a goodness of fit tool on various cases from different spatial regions and varying meteorological characteristics. For this purpose, mean precipitation data for 60 stations in Turkey are investigated. Results by testing the goodness of fit of probability functions through the entropy-based method show that Informational Entropy can be applied for fitting the probability function based on the investigated datasets.

## Keywords

Informational entropy Probability analysis Trend analysis Goodness of fit Water resources## 1 Introduction

All attempts towards an efficient development of water resources within technical and economic perspectives require data and information on the quality and quantity of such resources. Accordingly, measuring and evaluating hydrologic processes that represent time-variant characteristics of water resources is highly significant. It is up to the planner to extract the maximum amount of information from these series and to determine the required parameters for the design and operation of water resources systems. The planner is not only concerned with the availability of data but also with the proper delineation of data collection networks. This is due to the fact that collection, processing and acquisition of hydrologic data require a significant amount of labor and investment. Thus, it is highly important that collected data bring *new* information.

Considering the limited financial resources to be allocated to monitoring of data, emphasis must be put on the optimum selection of spatial of and temporal monitoring frequencies. The collected data must also be assured to meet the objectives of water resources planning and development. Consequently, it may be stated that an objective criterion is needed to determine the amount of information conveyed by observed data since all activities towards the planning of water resources rely on data and information contained in available observations.

Hydrologic processes must be observed to achieve optimum decisions regarding the design and operation of water resources systems. On the other hand, data collection practices must also be realized on an optimum basis in delineation of **what, where, when** and **how long** to measure. In relation to this need, planners lately used the terms *expected information, increase of information,* or *deficiency of information* and relate their design parameters to the information conveyed by available data. Information has often been expressed indirectly in terms of statistical parameters such as variance, standard error or correlation coefficient instead of quantitative terms (Harmancioglu and Singh 1998).

Entropy is a measure of the degree of uncertainty of random hydrological process. Since the reduction of uncertainty by means observations is equal to the amount of information gained, the entropy criterion indirectly measures the information content of a given series of data. Entropy measures can also employed when a monitoring activity reaches an optimal point in time after which any new data does not produce new information or can be use for diagnostic checking in time series modeling (Baran and Bacanli 2006, 2007a, b).

Sharifdoost et al. (2009) also introduced entropy method as an alternative test statistic for goodness of fit test. Alternative method has been compared with the classical test statistics Chi-squared, Kolmogorov-Smirnov and likelihood ratio tests. It is concluded that the entropy method is more sensitive than the usual statistics. Lee et al. (2011) proposed a test of fit based on maximum entropy. The test statistics are established and a corrected form for small and medium sample sizes is obtained through Monte Carlo simulations by real time series. Abbas et al. (2012) figured out the best fitting distribution to explain the annual maximum of daily rainfall for a specified time period. In their study, Gamma, Generalized Extreme Value (GEV) and Generalized Pareto (GP) distributions were fitted to annual maximum of daily rainfall data from each station and the performance of the distributions has been evaluated using different goodness of fit (GOF) tests by Chi-Square (CS), Kolmogorove-Smirnove (KS) and Anderson Darling (AD) tests. An empirical investigation of some tests of goodness of fit for the inverse Gaussian distribution has been carried out by Best et al. (2012). Three methods of goodness of fit tests, Chi-Square (C-S), Kolmogorov-Smirnov (K-S), and Anderson-Darling (A-D), have been evaluated for frequency analysis by Zeng et al. (2015). On the other hand, entropy-based measure of data–model fit that can be used to assess the quality of logistic regression models has been introduced in a study by Weiss and Dardick (2016). As an alternative method, in the study presented by Girardin and Lequesne (2017), mathematical justification of both tests based on Shannon entropy–called S tests– and relative entropy-called KL tests– was detailed leading to show their equivalence for testing any parametric composite null hypothesis of maximum entropy distributions. The methodology was applied over a real dataset of a DNA replication process to detect chicken cell lines.

By using the informational entropy method, the probability distribution of best fit to observed time series can be evaluated (Baran and Barbaros 2015; Baran et al. 2017a, b). Originating from this feasibility, the presented study targets exploring the applicability of the concept as a goodness of fit tool on various cases from different spatial regions and varying meteorological characteristics. The study consists of two parts. The first part aims basically at showing the strength of informational entropy to indicate variation of information. Originating from this intention, the mathematical formulation of Shannon’s entropy concept is employed as an indication of variation of information. The analyses to this end covers analyzing time series of meteorological variables, which were captured at meteorological monitoring stations located in a wide spatial extent of Turkey, through trend analysis approaches in order to get any primary indication about the occurrence of persistent trend behavior in the series investigated. The second part relates to testing the fit of probability distribution function of precipitation data in Turkey. For this purpose, 60 gaging stations having long-term precipitation data are investigated. In probability analysis, the best fit results of probability distribution functions achieved through the informational entropy method are compared to the Chi-square test indicator.

## 2 Informational Entropy

### 2.1 The Role of the Entropy Concept in Evaluation of Hydrologic Data

In mathematical communication theory, the concept of and the measure for information content have been derived from statistical and probabilistic principles. Within this respect, the theory has also been referred to as Statistical Communication Theory.

Mathematical communication theory analyzes the statistical structure for information of a series of numbers, signs or symbols that make up a communication signal, without considering all their kind, meaning, value or any other subjective characteristics. The term *information content* here refers to the capability of signals to create communication, and the basic problem is the generation of correct communication by sending a sufficient amount of signals, leading neither to any loss nor to repetition of information (Cherry 1957; Pfeiffer 1965).

The general concept of information content H(n) was later named with the entropy term (Shannon and Weaver 1949) as Shannon’s definition is very much similar to the entropy function described in statistical mechanics (Cherry 1957; Pierce 1961; Pfeiffer 1965).

*X*,

*H(X)*defines the condition in Eq. (3):

*N*is the number of events

*X*assumes. The condition above indicates that the entropy function has upper (ln

*N*) and lower (0 when

*X*is deterministic) bounds, assuming positive values in between. The discrepancies encountered in practical applications of the concept essentially result from the above errors in the definition of entropy for continuous variables. The main difficulty associated with the applicability of the entropy concept in hydrology originates from the lack of a precise definition when dealing with continuous variables.

### 2.2 Need for Measuring the Information Content of Hydrologic Processes

The entropy concept known as the Mathematical Communication Theory or Information Theory seems to offer such a criterion and appears to bring appropriate solutions to the aforementioned problems in water resources engineering. In the area of hydrology and water resources, a range of applications of entropy has been reported during the last decades. In the entropy-based parameter estimation, the distribution parameters are expressed in terms of the given information (Singh 1997, 1998).

One of the most prominent uses of the entropy principle is on assessing uncertainties in varying aspects of hydrology science, ranging from hydrological variables and model parameters to water resources systems within a much general context. Indeed, the entropy concept finds places in different practices that may include specific cases, such as the derivation of frequency distributions and parameter estimation, or broader cases such as hydrometric data network design. The most distinctive yield of entropy in such applications is its capacity for measuring uncertainty or information in quantitative terms (Harmancioglu et al. 1992; Harmancioglu and Singh 1998, 2002; Singh 1997, 2003).

On the other hand, researchers have also noted that some mathematical difficulties are encountered in the computation of various informational entropy measures. The major problem is the controversy associated with the mathematical definition of entropy for continuous probability distribution functions. In this case, the lack of a precise definition of informational entropy leads to further mathematical difficulties and, thus, hinders the applicability of the concept in hydrology. This problem needs to be resolved so that the informational entropy concept can be set on an objective and reliable theoretical basis and thereby achieve widespread use in the solution of water-resources problems based on information and/or uncertainty.

Baran et al. (2017b) have defined entropy as *the variation of information content* instead of *reduction of uncertainty ≡ amount of information gained.* The method is demonstrated on the series of the monthly observations of selected stream gaging stations. The results confirm that the new entropy definition leads to results that are more reliable and can be effectively used for the solution of water resources engineering problems related to uncertainty and information. The mathematical formulation developed does not depend on the use of discretizing intervals so that a single value for the variation of information can be obtained. According to authors, a new definition describes the concept, not as an absolute measure of information but as a measure of *variation of information.*

## 3 Entropy Concept as Variation of Information

### 3.1 Entropy Concept as Variation of Information for Continuous Variables

*p*and

*q*with (

*p*and

*q*∈ K), are considered in the probability space (Ω, K). Here,

*q*represents a priori probabilities (i.e., probabilities prior to making observations). When a process is defined in such a probability space, the information conveyed when the process assumes a finite value

*A*{

*A*∈ K} in the same probability space is executed by Eq. (4):

_{1}, …, A

_{n}) ∈ K; thus, the entropy expression for any value A

_{n}can be written as in Eq. (5):

*H*(

*X*/

*X*

^{∗}) of a random process

*x*in the same probability space can be defined as in Eq. (7):

*H*(

*X*/

*X*

^{∗}) is in the form of conditional entropy, i.e., the entropy of

*X*conditioned on

*X**. Here, the condition is represented by an a priori probability distribution function, which can be described as the reference level against which the variation of information in the process can be measured.

In Eq. (8), *X** represents information available before making observations on the variable *X*, and *X* is the a posteriori information (i.e., information obtained by making observations). Similarly, *q*(*x*) is the a priori and *f*(*x*) the a posteriori probability density function for the random variable *X*.

*q*(

*x*)} and a posteriori {

*p*(

*x*)} probability distribution functions of the random variable

*X*are known. If the ranges of possible values of the continuous variable

*X*are divided into

*N*discrete and infinitesimally small intervals of width Δx, the entropy expression for this continuous case can be given as in Eq. (9):

The above expression, as discussed by Guiasu (1977) and Jaynes (1983), describes the variation of information (or, indirectly, the uncertainty reduced by making observations) to replace the absolute measure of information content given in Eq. (2). At this point, the most important issue is the selection of a priori distribution. In case the process *X* is not observed at all, no information is available about it so that it is completely uncertain. In probability terms, this implies the selection of the uniform distribution. In other words, when no information exists about the variable *X*, the alternative events it may assume may be represented by equal probabilities or simply by the uniform probability distribution function. Previous research efforts where the above mathematical definitions were put forward (Guiasu 1977; Jaynes 1983) did not express, in the real sense, how the probability density functions q(x) and p(x) are to be decided. The present study develops an approach to address this fundamental need as described and experimented in the following sections.

*q*(

*x*)} is assumed to be uniform, and a posteriori {

*p*(

*x*)} distribution of

*X*is assumed to be normal, the informational entropy

*H*(

*X*/

*X*

^{∗}) can be expressed as in Eq. 10:

*X*and the last term stands for the maximum entropy. Accordingly, the variation of information can be expressed simply as follows:

*X*is assumed to be lognormal, the informational entropy

*H*(

*X/X*

^{*}) becomes as in Eq. 12:

_{y}and σ

_{y}being the mean and standard deviation of y = ln x.

In the approach presented herein, a progress from the most uncertain condition prior to any observation (associated with the assumption of uniform distribution considering equal probabilities for all values) was achieved toward the probability distribution foreseen based on the values monitored. It is made possible to compute variation of information for the assumption of a posteriori distribution, by considering Eq. 10 for normal and Eq. 12 for lognormal distributions accordingly.

### 3.2 The Meaning of the Variation of Information: The Distance between Two Continuous Distributions

_{0}, h

_{1}, and h

_{2}so that the difference between the two functions {Δ(p, q)} can be expressed as in Eq. (15):

_{0}, h

_{1}, and h

_{2}can be obtained as in the Equations through 16 to 18:

At the above critical half-range value “a”, which is obtained by Max Norm, it is possible to use the two functions p(x) and q(x) interchangeably with an optimum number of observations.

When two points represented by the a posteriori and a priori distribution functions, p(x) and q(x), respectively, in the same probability space they approach to each other. This indicates, in information terms, an information increase about the analyzed random process. The case when the two points coincide represents total information availability about the process.

### 3.3 Determination of Confidence Limits for Entropy Defined by Variation of Information

The proofs above, showed that i) The variation of information equation approaches a constant value, ii) it is possible to use the two functions q(x) (Uniform DF - priori probabilities -i.e., probabilities prior to making observations) and p(x) (Normal DF- a posteriori probability distribution - i.e., information obtained by making observations) and interchangeably with an optimum number of observations. Hence, it is possible to determine the confidence limits, if observations reached the optimum number for using the posterior distributions.

*a*,

*b*] of the variable

*X*is considered also as the population value of the range,

*R*, of the variable, the maximum information content of the variable may be described as in Eq. (20):

*H*(

*X*/

*X*

^{∗}) of the variable which is assumed to be normal, remains below the above value, one may decide that the normal probability density function is acceptable and that sufficient amount of information has been collected about the process.

_{y}, σ

_{y}), the variation of information for the variable x can be determined as:

*H*(

*X*/

*X*

^{∗}) for the lognormal distribution function will be as given in Eq. (27);

One may observe here that, no single constant value exists to describe the confidence limit for lognormal distribution. Even if critical half range is determined, the confidence limits will vary according to the variance of the variable. However, if the variance of x is known, one can compute the confidence limits (Baran et al. 2017a, b).

If the variable is actually normally distributed and if sufficient amount of observations are obtained, the entropy of (24) will approach a value, which can be considered within as acceptable region. This is the case where one may state sufficient information is collected about the process.

Consequently, as can be seen in the mathematical justification above, the definition of variation of information has the quality of indicating a proximity measure between two density functions (uniform-normal; uniform-lognormal) defined in the probability space. When the marginal entropy is considered to be defined depending upon the foreseen probability structure for the investigated process, it serves to determine confidence interval to evaluate whether a suitable selection was made (or in other words, the probability distribution selection was properly performed) or not, such that a decision for the convenient selection of the a posteriori distribution structure can be made through the measure of variation of information by considering the appearing value for the confidence interval.

## 4 Available Data

There are different climates in various parts of Turkey due to irregular topography. In the Taurus Mountains, it is close to the shore and rain clouds cannot penetrate the interior of the country. As the rain clouds pass over the mountains and reach Central Anatolia, they do not have an important ability to produce rain. The North Black Sea Mountains and the Caucasus Mountain keep the rain cloud, which is why the region is affected by the long and very cold winters and continental climate. In the eastern mountains, minimum temperatures between −30 °C and − 38 °C are observed and can snow for 120 days a year. Winter often passes through heavy snowfall. The villagers in this area can remain isolated for several days during the winter storms (DMI 2002; Bacanli et al. 2008).

All meteorological data that help demonstrate such regional differences in meteorological characteristics and that constitute the information basis of the analyses conducted in the presented study were provided from the State Meteorological Service - DMI. Precipitation data of 60 DMI gaging stations were basically investigated. The observation period is from January 1950 to December 1998. From the total set of stations, only those that have sufficient record lengths and that are not associated with significant trends were included in the analyses before any immediate effort for identifying convenient probability structures. Trend analyses were performed on precipitation data for climatological/meteorological observation stations (Automated Weather Observing System - AWOS), each of which are located in 60 major cities of Turkey. All meteorological data are provided from Turkish State Meteorological Service (DMI). Besides, it is noteworthy to mention here that the trend analyses indicated significant trend appearances on monthly series even though there was no actual trend detected in annual precipitation data.

## 5 Application to Goodness of Fit to Annual Precipitation

Baran et al. (2017b) showed that if it is assumed that a priori distribution is uniform and a posteriori distribution is normal, the assumption would be accepted, and on the other hand, the assumption that a priori distribution is uniform and a posteriori distribution is lognormal would be rejected. They have performed the test for normally distributed synthetic data sets. They have also performed similar exercises by generating lognormally distributed synthetic series and assuming that the posteriori distribution first as lognormal and then as normal distribution. In their case, the assumption of a uniform a priori distribution and a normal a posteriori distribution was rejected while the assumption associated with uniform and lognormal distribution combination was accepted.

In the presented study, the annual mean precipitation data sets captured in major cities, were investigated for the goodness of fit tests. The trend analyses through both parametric and non-parametric tests (Spearman-Rho and Mann-Kendall) showed there are no significant trends in the data series employed.

The best fit Probability Distribution Function (PDF) was explored for each precipitation gaging station. For this purpose, prior parameters – the basic statistics (mean, standard deviation, skewness, and excess coefficients) were calculated and the parameters for the best fit PDF indicated by the Chi-square tests were computed.

*Entropy as Variation of Information.*Basic statistical parameters for normal and lognormal distributions were calculated. This analysis was followed by the computation of marginal entropies (

*H*(

*X*),

*H*

_{max}and the variation of information

*H*(

*X*/

*H**)) for relevant distribution functions. The computations were carried out in a successive manner, using the first year’s 12 monthly total precipitation data, the second year’s 24, and so on until the total number of data are reached. Tests of the goodness of fit for both distributions by entropy as variation of information yielded the “

*Suitable*” indication for the entire cases. The results are given as an example for the eight selected cities out of 60 which are located in different parts of Turkey. The indicative eight stations (each from eight different cities) were decided to exemplify the total set of 60 stations with the consideration that each indicates a different climatic zone throughout the entire territory of Turkey. The climatic zones that appear in Fig. 4 indicate the wet, humid, medium-humid, medium-dry, dry and very dry zones associated with Zonguldak, Adapazarı, Manisa, Izmir, Ankara and Adıyaman stations in the respective order. In designating the climate zones, different patterns in precipitation, evaporation, evapotranspiration and runoff characteristics were basically considered. In addition to this identification, the selections for medium-humid and medium-dry zones were made in doubles due to the prominent differences in the climate zones itselves from the perspectives of precipitation pattern and the variation of precipitation.

Results of *Chi-square test* for the considered distributions

Station name | Normal χ | Log-normal χ |
---|---|---|

Adapazarı | | |

Adıyaman | | 1,22 |

Afyon | | 8,44 |

Amasya | | 2,52 |

Ankara | 5,02 | |

Izmir | | 1,31 |

Manisa | 2,32 | |

Zonguldak | 8,04 | 7,43 |

… |

Results of informational entropy method for normal distribution

Station name | Annual mean | Range | H | H(x) | H(X/X | Test result |
---|---|---|---|---|---|---|

Adapazarı | 821,67 | 540,9 | 5,87 | 5,21 | 0,65 | Suitable |

Adıyaman | 729,98 | 606,75 | 5,91 | 5,61 | 0,30 | Suitable |

Afyon | 420,67 | 385,8 | 5,11 | 4,66 | 0,45 | Suitable |

Amasya | 455,28 | 289,1 | 4,97 | 4,75 | 0,22 | Suitable |

Ankara | 399,15 | 364,5 | 4,80 | 4,66 | 0,14 | Suitable |

Izmir | 680,06 | 760,5 | 5,94 | 5,63 | 0,31 | Suitable |

Manisa | 733,24 | 788,4 | 5,98 | 5,63 | 0,34 | Suitable |

Zonguldak | 1218,69 | 1065,1 | 6,46 | 5,67 | 0,79 | Rejected |

… |

Selected summarized results of informational entropy method for log normal distribution

Station name | H | H(X) | H(X/X | H | Test result |
---|---|---|---|---|---|

Adapazarı | 5,87 | 3,73 | 2,14 | 2,1434 | Suitable |

Adıyaman | 5,91 | 4,19 | 1,71 | 2,0746 | Suitable |

Afyon | 5,11 | 3,22 | 1,90 | 2,1050 | Suitable |

Amasya | 4,97 | 3,31 | 1,66 | 2,1002 | Suitable |

Ankara | 4,80 | 3,23 | 1,57 | 2,0909 | Suitable |

Izmir | 5,94 | 4,20 | 1,74 | 2,0937 | Suitable |

Manisa | 5,98 | 4,21 | 1,76 | 2,0774 | Suitable |

Zonguldak | 6,46 | 4,20 | 2,25 | 2,1215 | Rejected |

… |

## 6 Results and Discussion

*entropy*to test the goodness of fit of normal and lognormal probability distribution functions.

The results showed that the same Chi-square statistics were obtained for the Adapazarı station (Table 1). The values for the variation of entropy computed from the complete series help to indicate that both normal and lognormal distributions can be accepted (Tables 2 and 3). However, it can be observed from the inspection of the Fig. 5a and b that the value for the variation of information took place out of the confidence limits for again both distributions in the first 17 years of the entire monitoring period. With the information gained through observations in later periods, the changing values of the variation of information indicate the better fit of the normal distribution.

The Chi-square computations again display the good-fit of both distributions in the case of Adıyaman station, with a slightly better performance indicated for the normal distribution (Table 1). The values for the variation of information computed from the entire series again indicate the acceptability for both normal and lognormal distributions (Tables 2 and 3). Figure 6a and b show that the values for the variation of information are inside the confidence limits for both distributions. The improved quantity of information provided by the increased length of the observation period provides a better fit for the normal distribution as indicated by the associated values of the variation of information.

In the case of Ankara station, acceptability of both distributions is indicated by the Chi-square statistics, but lognormal distribution looks to be preferably suitable (Table 1). The variation of entropy computed over the entire period, on the other hand, indicates the suitability of both distributions (Tables 2 and 3). The variation of information is apparently inside the confidence limits again for both distributions (Fig. 7a and b). As the values for the variation of information indicate, lognormal distribution can be reserved for a better performance as the observation period develops.

As can be seen from the mean total precipitation and range values (see Table 2), the average annual precipitation for the Manisa station is almost 1.6 times of the quantity of the Amasya station. Range/mean ratio values were also determined as 0.64 and 1.07 respectively for the Amasya and Manisa stations. While the best fit distribution is normal in the case of Amasya station, the analyses indicate the suitability of lognormal distribution for the Manisa station (Table 1 and Fig. 9). It is also observed collectively in Table 1 and Fig. 8 that Izmir station has precipitation quantity of 1.62 times more than that of the Afyon station, and the range/mean ratios are 0.92 and 1.12 for Afyon and Izmir stations respectively. Normal distribution appeared to be the best fit in the case of both stations.

The Chi-square results showed that, both distributions can be accepted for the Izmir station. However, the normal distribution is more suitable (Table 1). The values of the variations of entropy which were calculated by using the whole series indicate the same results as well (Tables 2 and 3). When Fig. 8a and b are evaluated, it is seen that the variation of information value is within the confidence limit for both distributions. The information gained with the increase in the number of observations shows that the developing pattern for the variation of information provides slightly better indication for accepting the lognormal distribution as opposed to the Chi-square result.

The Chi-square results showed that, both distributions can be accepted for the Manisa station but the lognormal distribution is more suitable than the normal distribution (Table 1). The values of the variations of entropy which were calculated by using the whole series lead to the same inference (Tables 2 and 3). Figure 9a and b demonstrate that the variation of information value is within the confidence limit for both distributions, while the increase in the data length through the development of monitored years with the associated figure for the variation of information provides clues for accepting lognormal distribution.

As regards the data monitored in the Zonguldak station, the Chi-square results suggest rejection for the both distributions (Table 1). The variation of entropy computed from the entire data series indicates the rejection for both (Tables 2 and 3). On the other hand, the variation of information statistic seem to stay inside the confidence limits for both as of the end of the years 6 and 7, but with the inclusion of the data from the later periods the changing pattern of the variation of information shows that confidence limits are exceeded for both distributions as given in Fig. 10a and b.

## 7 Conclusions

In the presented study, entropy analyses were practiced to investigate the validity of the probability distribution that was identified based on the time series data recorded in meteorological stations. To this end, the concept introduced with the given mathematical approach was named as the variation of information and by using this new definition distribution fitting performances for the normal and lognormal distributions were assessed by considering the associated confidence limits.

The computational results were evaluated in comparison to the Chi-square statistic, which results from a test that has been widely used in investigating the goodness-of-fit for a variety of distributions. As a result of evaluations for the normal and lognormal distribution for the monthly total rainfall values as time series, the variation of information concept was found to give results that were in line with Chi-square values.

The feasibility gained through the concept of variance of information in allowing tests for the acceptance of posterior distributions estimated based on the number of observations in the time series is expected also to contribute to the efforts in different analyses that specifically consider the distributional structure of investigated time series and to the efforts for examining the compatibility between the recorded time series and the synthetic series to result from modelling.

Consequently, the presented application of entropy as variation of information method can be stated as one of the effective tools to evaluate hydrological data besides the other testing methods and it can be stated as a potential tool to be used in investigating climate change effects, drought analysis and in decision making processes.

## Notes

### Acknowledgments

A previous shorter version of the paper has been presented in the 10th World congress of EWRA “Panta Rei” Athens, Greece, 5–9 July 2017.

### Compliance with Ethical Standards

### Conflicts of Interest

The authors declare no conflict of interest.

## References

- Abbas K, Alamgir, Khan SA, Khan DM, Ali A, Khalil U (2012) Modeling the distribution of annual maximum rainfall in Pakistan. Eur J Sci Res 79(3):418–429. ISSN 1450-216XGoogle Scholar
- Bacanli UG, Dikbas F, Baran T (2008) Drought analysis and a sample study of Aegean Region. 6
^{th}International conference on ethics and environmental policies, ItalyGoogle Scholar - Baran T, Bacanli UG (2006) Evaluation of suitability criteria in stochastic modeling. European Water 13/14:35–43Google Scholar
- Baran T, Bacanli UG (2007a) An entropy approach for diagnostic checking in time series analysis. Water SA 33(4):487–496Google Scholar
- Baran T, Bacanli UG (2007b) Evaluation of goodness of fit criterion in time series analysis. Digest 2006 17:1089–1102Google Scholar
- Baran T, Barbaros F (2015) Testing the goodness of fit by informational entropy, European Water Resources Association 9
^{th}World Congress. Water resources management in a changing world: challenges and opportunities, June 10-13, 2015, Istanbul, CD of Proceedings, 7 p, Book of Abstracts, p 59Google Scholar - Baran T, Barbaros F, Gul A, Onusluel Gul G (2017a) An informational entropy application to test the goodness of fit of probability functions. 10th World Congress of EWRA on Water Resources and Environment, ‘Panta Rhei’, 5-9 July 2017, Athens, Greece, Congress Proceedings, pp 403–408Google Scholar
- Baran T, Harmancioglu N, Cetinkaya CP, Barbaros F (2017b) An extension to the revised approach in the assessment of informational entropy. Entropy 19:634. https://doi.org/10.3390/e19120634 CrossRefGoogle Scholar
- Best DJ, Rayner JCW, Thas O (2012) Comparison of some tests of fit for the inverse gaussian distribution. Advances in Decision Sciences, 2012:Article ID 150303, 9 pages. Hindawi Publishing Corporation. https://doi.org/10.1155/2012/150303
- Cherry C (1957) On human communication: a review, a survey and a criticism. the Technology Press of Massachusetts Institute of Technology, Massachusetts, 333 pCrossRefGoogle Scholar
- DMI (2002) Turkish state meteorological service. www.mgm.gov.tr. Accessed 15 March 2017
- DMI (2016) Turkish state meteorological service. https://www.mgm.gov.tr/FILES/arastirma/yagis-degerlendirme/2016alansal.pdf. Accessed 1 March 2017
- Girardin V, Lequesne J (2017) Entropy-based goodness-of-fit tests—a unifying framework: application to DNA replication. Commun Stat Theor M. https://doi.org/10.1080/03610926.2017.1401084
- Guiasu S (1977) Information theory with applications. Mc Graw-Hill, New York. 439 p, ISBN 978-0070251090Google Scholar
- Harmancioglu N, Singh VP (1998) Entropy in environmental and water resources. In: Herschy RW, Fairbridge RW (eds) Encyclopedia of hydrology and water resources, vol 5. Kluwer Academic Publishers, Dordrecht, pp 225–241. ISBN 978-1-4020-4497-7CrossRefGoogle Scholar
- Harmancioglu N, Singh VP (2002) Data accuracy and data validation. In: Sydow A (ed) Encyclopedia of life support systems (EOLSS); knowledge for sustainable development, theme 11 on environmental and ecological sciences and resources, chapter 11.5 on environmental systems, vol 2. UNESCO Publishing-Eolss, Oxford, pp 781–798. ISBN 0 9542989-0-XGoogle Scholar
- Harmancioglu N, Singh VP, Alpaslan N (1992) Versatile uses of the entropy concept in water resources. In: Singh VP, Fiorentino M (eds) Entropy and energy dissipation in water resources, vol 9. Kluwer Academic Publishers, Dordrecht, pp 91–117. ISBN 978-94-011-2430-0CrossRefGoogle Scholar
- Jaynes ET (1983) In: Rosenkrantz RD (ed) Papers on probability, statistics and statistical physics. Springer, Dordrecht. 458 p, ISBN 978-94-009-6581-2Google Scholar
- Lee S, Vontab I, Karagrigoriou A (2011) A maximum entropy type test of fit. Comput Stat Data An 55:2635–2643CrossRefGoogle Scholar
- MGM (2017) Turkish state meteorological service. https://www.mgm.gov.tr/FILES/resmi-istatistikler/parametreAnalizi/Turkiye-Ortalama-Sicaklik.pdf. Accessed 10 Dec 2016
- Pfeiffer PE (1965) Concept of probability theory. McGraw-Hill Book Company, New York. 399 p, ISBN: 978-0486636771Google Scholar
- Pierce JR (1961) Symbols, signals and noise: the nature and process of communication. Harper and Row Publisher INC, New York. ISBN: 978-0061392320Google Scholar
- Shannon CE (1964) A mathematical theory of communication. In: Shannon, Weaver (eds) The mathematical theory of communication. The University of Illinois Press, UrbanaGoogle Scholar
- Shannon CE, Weaver W (1949) The mathematical theory of communication. University of Illinois Press, Urbana. 144 pGoogle Scholar
- Sharifdoost M, Nematollahi N, Pasha E (2009) Goodness of fit test and test of Independence by entropy. Journal of Mathematical Extension 3(2):43–59Google Scholar
- Singh VP (1997) The use of entropy in hydrology and water resources. Hydrol Process 11:587–626CrossRefGoogle Scholar
- Singh VP (1998) Entropy-based parameter estimation in hydrology. Kluwer Academic Publishers, Dordrecht. 364 pCrossRefGoogle Scholar
- Singh VP (2003) The entropy theory as a decision making tool in environmental and water resources. In: Karmeshu (ed) Entropy measures, maximum entropy principle and emerging applications. Studies in fuzziness and soft computing, vol 119. Springer, Berlin, pp 261–297. ISBN 978-3-540-36212-8CrossRefGoogle Scholar
- Weiss BA, Dardick W (2016) An entropy-based measure for assessing fuzziness in logistic regression. Educ Psychol Meas 76(6):986–1004CrossRefGoogle Scholar
- Zeng X, Wang D, Wu J (2015) Evaluating the three methods of goodness of fit test for frequency analysis. Journal of Risk Analysis and Crisis Response 5(3):178–187CrossRefGoogle Scholar