Introduction

Drought is a natural and repetitive phenomenon which occurs due to reduction in rainfall over a certain period. That whether or not this phenomenon can be short and less severe depends on its extremity, continuance, and the distance of impacted area. It may start slowly but emerge in a relatively long interval in different sections including agriculture, water resources, economy, environment, etc. (Mishra and Singh 2010). Drought can occur in any climatic conditions throughout the world. The designing and management of water resources and different agricultural sections are highly related to how manage drought and adopt proper guidelines to face such phenomena (Fadaei-Kermani et al. 2017).

Drought prediction and monitoring can play a very important role in the system management of water resources and remarkably decrease the damage. In general, the intensity of drought is predicted and monitored via drought indices. The drought indices aim to expound the phenomenon quantitatively and also include the combination of different effective features on drought in quantitative and simple relations. There are usually various indices to monitor this phenomenon including: Palmer-Drought Severity Index (PDSI) (Palmer 1968), Deciles Index (DI) (Gibbs and Maher 1976), Standardized Precipitation Index (SPI) (McKee et al. 1993), Reclamation-Drought Index (RDI) (Weghorst 1996), US Drought Monitor (USDM) (Svoboda et al. 2002) and, etc.

In recent years, drought and its dependent crises and threats have become one of the most important global challenges. A large number of research has been conducted regarding the drought monitoring and control techniques (e.g., Luo and Wood 2007; Paulo and Pereira 2008; Rhee et al. 2010; Pan et al. 2013; Fadaei-Kermani et al. 2014; Hao and AghaKouchak 2014; Wood et al. 2015; Hao et al. 2016; Park et al. 2017; Yu et al. 2018; Abbasi et al. 2019). These studies used drought indices, machine learning, and data mining algorithms to monitor and predict the severity of the effects caused by drought.

By average annual rainfall of 240–250 mm as one third of average global figure, Iran is considered among the regions in which these are insufficient proper precipitation. Since most parts of this country are covered with dried areas, water has played a vital role in its economic development. In the present study, via the fuzzy k-nearest neighbor model, a method has been proposed to predict the most likely drought status of Kerman, south eastern of Iran. The nonparametric techniques (e.g., fuzzy-k-nearest neighbor algorithm) can be applied as convenient approaches for estimating drought conditions. These algorithms can be useful in problems that the relationships between instances are not already obvious and fully determined.

Standard Precipitation Index (SPI)

McKee et al. (1993) proposed the SPI (Standardized Precipitation Index) for drought monitoring respect to multiple time scales. SPI is widely used for characterizing and detecting meteorological drought which can be compared across regions with significantly different climate. It is determined using long-term precipitation records, and then a Z-standard normal distribution is fitted according to following equations:

$$ {\text{SPI}} = - \left[ {a - \frac{{E_{0} + E_{1} a + E_{2} a^{2} }}{{1 + d_{1} a + d_{2} a^{2} + d_{3} a^{3} }}} \right]\;a = \sqrt {\ln \left[ {\frac{1}{{P(x)^{2} }}} \right]} \quad 0 < P(x) \le 0.5 $$
(1)
$$ {\text{SPI}} = + \left[ {a - \frac{{E_{0} + E_{1} a + E_{2} a^{2} }}{{1 + d_{1} a + d_{2} a^{2} + da^{3} }}} \right]\;a = \sqrt {\ln \left[ {\frac{1}{{1 - P(x)^{2} }}} \right]} \quad 0.5 < P(x) \le 1 $$
(2)
$$ \begin{aligned} & d_{1} = 1.432788\quad E_{0} = 2.51557 \\ & d_{2} = 0.189269\quad E_{1} = 0.802853 \\ & d_{3} = 0.001308\quad E_{2} = 0.010328 \\ \end{aligned} $$

where P(x) refers to the cumulative probability function. According to precipitation data, time series can be obtained. After the data were sorted in increasing order, the empirical probability distribution is calculated as follows (Fadaei-Kermani et al. 2017):

$$ {\text{ECP}} = \frac{a}{b + 1} $$
(3)

where a equals the row number of sorted precipitation data, and b represents the precipitation data total number. The standard normal cumulative distribution curves can be used to calculate the Standard Precipitation Index (SPI) for each corresponding time scales.

Table 1 represents the drought intensities classification according to the range of SPI values. Anytime the SPI value is continuously negative, drought is likely to occur. On the other hand the event ends when the SPI value becomes positive (Moreira et al. 2006).

Table 1 Classification of drought intensity due to SPI values

Fuzzy-nearest neighbor algorithm

As one of the most popular nonparametric and lazy instance based machine learning algorithms, the k-nearest neighbor algorithm is extensively applied in data mining and pattern recognition (Fadaei-Kermani et al. 2015). In recent years, several approaches to nearest neighbor modeling have been suggested based on fuzzy mathematics to improve the quality of the classification. Keller et al. (1985) proposed a fuzzy version of the basic k-NN algorithm by incorporating the theory fuzzy sets into the standard k-NN. It was named Fuzzy-nearest neighbor algorithm (Fuzzy-kNN). The both k-nearest neighbor and Fuzzy-nearest neighbor algorithms involve measuring the similarity of a new instance (unknown instance) to the instances with a specific label in the training set. Then, by determining a set of k nearest neighbors, and casting a vote on the class of query instances, the most likely class can be dedicated to the unknown instance by incorporating all the votes (Derrac et al. 2016 and Ezghari et al. 2017).

Owing to the Fuzzy-nearest neighbor algorithm, rather than individual classes as in the k- nearest neighbor modeling, a Fuzzy Membership Function (FMF) of samples can be specified to all various categories (Kermani et al. 2018). Let \( X = \left( {x_{1} , x_{2} , \ldots , x_{n} } \right) \) be a training set consists of n labeled samples which are introduced by C classes. In case of a new unknown instance Y, the class confidence values can be determined as the aggregation of k nearest neighbors’ class attributes according to Eq. 4 (Keller et al. 1985).

$$ u_{i} \left( y \right) = \frac{{\mathop \sum \nolimits_{j = 1}^{k} \mu_{ij} \left( {1/\|y - x_{j}\|^{{2/\left( {m - 1} \right)}} } \right)}}{{\mathop \sum \nolimits_{j = 1}^{k} \left( {1/\|y - x_{j}\|^{{2/\left( {m - 1} \right)}} } \right)}} $$
(4)

where \( i = 1,2, \ldots , C \), and \( j = 1,2, \ldots , k \). The fuzzy strength parameter m is utilized to intensify the distances between the unknown instances and the related elements of training data set. The value of m can be chosen as \( m \in \left( {1, + \infty } \right) \) that is often m = 2. \( y - x_{j} \) expresses the distance between y and its jth nearest neighbor from the training set data xj. \( \mu_{ij} \) refers to the membership rating of the instance xj among the training set to the class i, among the k nearest neighbors of x that is satisfied the following relations (Derrac et al. 2016):

$$ \mu_{ij} \in \left[ {0, 1} \right] $$
(5a)
$$ 0 < \mathop \sum \limits_{j = 1}^{k} \mu_{ij} < k $$
(5b)
$$ \mathop \sum \limits_{i = 1}^{c} \mu_{ij} = 1 $$
(5c)

where \( 1 \le i \le C \) and \( 1 \le j \le k \).

In the general Fuzzy-kNN model, various techniques can be applied to define \( \mu_{ij} \). In the case of crisp labeling, every instance has membership of one in its known class and zero-membership in other classes. In case of a constrained fuzzy membership, the k nearest neighbors of every training set data (xk) is determined, and then the membership of xk in every class can be calculated using the following membership function (Keller et al. 1985):

$$ \mu_{ij} \left( {x_{k} } \right) = \left\{ {\begin{array}{ll} {0.51 + \left( {\frac{{n_{j} }}{K}} \right)*0.49, } \hfill &\quad { {\text{if}}\;j = i} \hfill \\ {\left( {\frac{{n_{j} }}{K}} \right)*0.49,} \hfill &\quad { {\text{otherwise}}} \hfill \\ \end{array} } \right. $$
(6)

where nj represents the neighbors number found which fit in the jth class. The fuzzy procedure causes no arbitrary assignments can be made by the algorithm. Moreover, a level of assurance should be provided by the membership values of the vector to attend the outcome classification.

Model processing and application

In the present study, the hydrological and precipitation data of Kerman city during 1980–2018 has been investigated. The area is located in southeast of Iran between 53° and 26 min to 59° and 29 min of eastern length and 25° and 55 min to 32° northern latitudes (Fig. 1). Drought has been always a prevalent phenomenon in Kerman province. The area has never been detached from the destructive consequences of this phenomenon.

Fig. 1
figure 1

The location of Kerman province in Iran map

According to the precipitation data of Kerman city, the moving time series and, respectively, the standard normal distribution functions were determined based on different time scales. By calculating the standard normal cumulative distribution, the SPI value can be obtained for every corresponding time scales. Figure 2 presents the precipitation cumulative and standard normal probability distribution functions of the Kerman precipitation data for 3-, 6-, 12-, 24- and 48-month time scales. These graphs can be used to determine the SPI value and corresponding drought status according to the precipitation data for every time scales.

Fig. 2
figure 2figure 2

The precipitation cumulative and standard normal probability distribution functions

Then, according to the standard normal probability and the precipitation cumulative probability distribution functions, the SPI values for various time scales have been calculated. For example, Fig. 3 shows the calculated values of 3-, 6-, 12- and 24-month SPI for the study area during different years.

Fig. 3
figure 3

The 3-, 6-, 12- and 24-month SPI values for different years

Then the calculated values of SPI during the desired period can be applied in the Fuzzy-nearest neighbor model. Before working with the model, the data should be normalized using the relation 7.

$$ Y^{'} = \frac{{y - \overline{y} }}{\sigma (y)} $$
(7)

where the normalized variable value (Y′) can be obtained according to standard deviation (σ (y)) and mean (\( \overline{y} \)) of the observed variable values in the reference dataset.

Finally, the accuracy and efficiency of the model can be evaluated via root-mean-square error (RMSE), mean absolute error (MAE), coefficient of correlation (r) and coefficient of residual mass (CMR). These coefficients can be obtained using following equations (Dashtaki et al. 2009):

$$ {\text{MAE}} = \frac{{\sum\nolimits_{i = 1}^{n} |{x_{i} - \left. {y_{i} }| \right|} }}{n} $$
(8)
$$ r = \frac{{n\left[ {\sum\nolimits_{i = 1}^{n} {y_{i} x_{i} } } \right] - \left[ {\sum\nolimits_{i = 1}^{n} {y_{i} } } \right]\left[ {\sum\nolimits_{i = 1}^{n} {x_{i} } } \right]}}{{\sqrt {\Big[n\sum\nolimits_{i = 1}^{n} {y_{i}^{2} - (\sum\nolimits_{i = 1}^{n} {y_{i} )^{2}\Big ]\Big[n\sum\nolimits_{i = 1}^{n} {x_{i}^{2} - (\sum\nolimits_{i = 1}^{n} {x_{i} )^{2}\Big ]} } } } } }} $$
(9)
$$ {\text{RMSE}} = \left[ {\frac{{\sum\nolimits_{i = 1}^{n} {(x_{i} - y_{i} )^{2} } }}{n}} \right]^{0.5} $$
(10)
$$ {\text{CRM}} = \frac{{(\sum\nolimits_{i = 1}^{n} {x_{i} } ) - (\sum\nolimits_{i = 1}^{n} {y_{i} } )}}{{\sum\nolimits_{i = 1}^{n} {x_{i} } }} $$
(11)

where yi and xi express the values of predicted and measured attributes, and n refers to the number of attributes.

Results and discussion

At the beginning of the calculations, the number of nearest neighbors of the attributes for the Fuzzy-kNN model should be determined. The best value of K (number of nearest neighbor) can be determined by n-fold cross-validation method. First, the data set is divided into n equal-sized parts (Fig. 4). For each part, the model is trained to the other data set parts, and the prediction error of the fitted model is calculated when the desired part of the data is predicted. The procedure is done for every value of k (k = 1, 2, …, K) to obtain the best value of k with minimum prediction error rate (Huang et al. 2017). In Fig. 5 the precision of the fourfold cross-validation method according to the sum of squares error (SSE) coefficient has been shown. Due to Fig. 5, the values of k = 15, 17 and 18 have the same lowest error rate. The K value of 18 has been selected for the Fuzzy-kNN model since according to Travis and Mays (2010) the larger k values can often minimize risk of overfitting.

Fig. 4
figure 4

The n-fold cross-validation method scheme

Fig. 5
figure 5

The error rate for Fuzzy-kNN model according to fourfold- cross-validation

After determining the best value of K, the value can be introduced to the Fuzzy-kNN model for further computations. Then according to calculated SPI values, the most likely drought situation for the city of Kerman was determined during different years. Table 2 and Fig. 6 present the region drought classification determined and predicted by the Fuzzy-kNN model.

Table 2 Drought classification assigned by the Fuzzy-kNN model for the city of Kerman
Fig. 6
figure 6

Assigned membership to each drought class according to the Fuzzy-kNN model

According to the results, Kerman has recently been exposed to drought and also rainfall shortages in normal and even much lower than normal levels. This is clearly evident in the drought classes which are assigned to the region (classes 4 and above). Since the average annual precipitation in Kerman is about 122 mm compared to the average annual rainfall of Iran (about 250 mm), which is very low on the global scale, it indicates the damage and the intensity of the phenomenon in this region. The accuracy and precision of the present model results have been evaluated by coefficient of correlation (r), root-mean-square error (RMSE), coefficient of residual mass (CRM) and mean absolute error (MAE). The calculated values are presented in Table 3. Appropriate values of the coefficients indicate the acceptable precision and low error rate of the Fuzzy-kNN model for drought monitoring and prediction.

Table 3 The Fuzzy-kNN model evaluation

Conclusion

Drought is a climatic phenomenon, occurring in any climatic conditions which can affect different aspects of water resources management and planning. The present study deals with the investigation of drought intensity and status in city of Kerman located in Iran. In this paper, a new approach was proposed to monitor and predict the most likely drought status for the study area via the Fuzzy-kNN model. At first according to the precipitation data, the values of SPI for different time scales were determined. Then the Fuzzy-kNN modeling was employed to predict the most likely drought status. The results showed that this city has faced drought and also rainfall shortages in the recent years which are consistent with real observations. Finally, the values of coefficient of correlation (r = 0.924), root-mean-square error (RMSE = 0.108), coefficient of residual mass (CRM = 0.0012) and mean absolute error (MAE = 0.101) were calculated according to the results of Fuzzy-kNN modeling. The results indicated that the present model is efficient and accurate.