Introduction

Precipitation is the process by which water vapor condenses under atmospheric conditions and precipitates in liquid or solid form due to the force of gravity. In terms of being located on the belt of arid regions of the world, Iran is mostly surrounded by arid and semi-arid areas, and the lack of rainfall in these regions makes it difficult to establish vegetation. The amount of rainfall on the country's watersheds is less than one third of the world average. The estimation of precipitation is necessary for the implementation of water resources studies, drought, land management, environment, watershed management and comprehensive agricultural projects. Today, there are various methods such as numerical and intelligent models to simulate the phenomenon of precipitation. Machine learning (ML), metaheuristic models, optimization algorithms, and neuro-fuzzy methods are among the most important useful tools for modeling various nonlinear and complex phenomena in recent years (Khoshbin et al. 2016; Akhbari et al. 2017; Shabanlou et al. 2018; Azimi and Shiri 2021; Azimi et al. 2022). Artificial intelligence (AI)-based models have been broadly implemented in different fields to model several linear and nonlinear problems because they are quick, inexpensive, reliable and precise.

These models have recently been used to model hydrological phenomena such as precipitation. For example, Kisi and Shiri (2011) simulated the daily precipitation of the Izmir station in Turkey by combining neuro-fuzzy models, gene expression programming and wavelet function of two hybrid models. Mekanik et al. (2013) estimated long-term rainfall over a 110-year period within a region of South Australia using multiple linear regression models and artificial neural networks (ANNs). The researchers reviewed the results of the study and concluded that the ANN model performed better. Subsequently, Abbot and Marohasy (2014) predicted monthly rainfall in Queensland, Australia, by selecting inputs and optimizing them using ANNs. They concluded that predictions done by the ANN are more accurate than common models. Akrami et al. (2014) by employing ANFIS, ANN and wavelet transform, presented different hybrid numerical models to simulate rainfall in the Klang River Basin in Malaysia. In their study, it was shown that the results of wavelet-ANFIS and wavelet-ANN models predicted precipitation values with better accuracy. Shafaei et al. (2016) estimated rainfall data of the 40-year time series of the city of Nahavand by a hybrid model.

They integrated the ANN, SARIMA, and wavelet transform models. They showed that the model developed from the combination of these three algorithms simulated the values of the objective function more accurately. In another research conducted by Phukoetphim et al. (2016), the ANN and symbolic regression were used to integrate the computed discharges of rainfall-runoff models in several basins of Thailand and New Zealand. Their findings showed that multi-model methods for flow estimation depend on the type and characteristics of the basin. Sarzaeim et al. (2017), in order to predict runoff under climate change conditions, implemented AI approaches such as genetic algorithm and ANN and support vector machine in the Aidonmush basin in Iran. They concluded that the SVM method displayed a better performance than the others. Then, Purnomo et al. (2017) estimated the amount of rainfall in the period from 2001 to 2013 in the center of Java, Indonesia by two different ANN models. They showed that the neural models estimated the amount of rainfall with acceptable accuracy. Also, Yaseen et al. (2018) using the firefly algorithm managed to optimize ANFIS to estimate rainfall in one of the areas of Malaysia. They indicated that the hybrid model estimated rainfall values more accurately than ANFIS. Danladi et al. (2018) utilized ANFIS to estimate precipitation changes over a short period of time. They considered rainfall amounts as a function of temperature and relative humidity and modeled them by employing these two variables. Diez-Sierra and del Jesus (2020) estimated daily rainfall amounts in a semi-arid region in Spain using regression models, the ANN and SVM. By testing the simulation results, the authors showed that the ANN model performed better.

More recently, Endalie et al. (2022) used a forecasting model based on long short-term memory (LSTM) capable of forecasting Jimma's daily rainfall. They also compared this machine learning with other methods such as multilayer perceptron (MLP), k-nearest neighbors (KNN), support vector machine (SVM), and decision tree (DT).

Studying previous researches reveals that the combined model of ANFIS network, firefly algorithm and wavelet model has not been used to simulate long-term rainfall. Therefore, in the present study, the ANFIS, firefly and wavelet models are integrated and the long-term rainfall of Ardabil is simulated on a monthly basis over a 44-year period from 1976 to 2020 for the first time. First, via the autocorrelation function, the impacting lags of the time series data are identified and using them, the rainfall values are estimated. In the following, by performing several different analyzes, the best model in computing rainfall amounts is introduced.

Methods

Study area

The city of Ardabil is situated in the middle of Ardabil Plain; this city is situated at an altitude of 1500 m above sea level and between the Talesh and Sabalan mountains in the northwest of the Iranian plateau. The Mediterranean flow as the main source of rainfall along with moderating the temperature and humidity enters from the west. The air flow of Siberia in Central Asia enters from the northeast with a cold nature, and after passing the Caspian Sea and absorbing water vapor of Ardabil in summer, it causes a sharp reduction in heat and cooling of the air. In the region, there are two types of local winds and front winds that start with the penetration of the front toward the region. Rainfall occurs in all seasons, but it is more intense in spring and autumn. Based on the Ardabil meteorological station data, the average rainfall is 327.7 mm. This area has very cold winters and mild summers with an average temperature of about seven degrees Celsius. Figure 1 shows the study area.

Fig. 1
figure 1

The location of Ardabil city, Iran, as the study area in this research

In this paper, a hybrid method is presented for precipitation forecasting in Ardabil. The proposed hybrid method is on the basis of the ANFIS which is optimized using the wavelet transform and firefly algorithm. In the following, the details of each method, along with how to combine them, are given in detail.

ANFIS

Fuzzy systems are one of the relatively new and popular methods in modeling dynamic and complex phenomena. The theory of fuzzy systems based on linguistic expression was proposed in 1965 by Zadeh (Zadeh 1965). On the other hand, classical neural networks have been considered by many researchers in recent decades as a powerful tool in modeling real problems. Jang (1993), by integrating the ability of neural network in training and using linguistic expression presented in fuzzy systems, introduced the ANFIS so that the ability of training in neural network methods to increase the modeling feature provided in fuzzy systems by Zadeh (1965) is employed.

The ANFIS network has two main parts called "antecedent" and "consequent". The antecedent part uses Takagi–Sugeno Fuzzy systems. The general structure defined for the rules in this part is in the form of Eq. (1):

$${\text{If}}\,\,\left( {A_{1} \,\,{\text{is}}\,\,K_{iz} } \right),\,\,\,\,...\,,\,{\text{and}}\,\,\left( {A_{n} \,\,{\text{is}}\,\,K_{nz} } \right)\,\,{\text{Then}}\,\,B_{z} = q_{0} + q_{1z} A_{1} + ... + q_{nz} A_{n}$$
(1)

where Kiz (i = 1, 2,…, n) is known as a fuzzy set, so that each of these fuzzy sets is used as a linguistic expression to provide a linguistic explanation of each input. The ANFIS network defined by Jang (1993) has 5 general layers, each of which is presented below with its details.

Layer 1: The nodes provided in the first layer are considered as the degree of membership of each of the inputs of the problem, which are defined as Eq. (2).

$$O_{I,1} = \mu K_{iz} \left( A \right)$$
(2)

where A denotes the input of the ith node, Kiz is the fuzzy set associated with the input, and μ is the membership function (MF). MFs that can be employed in ANFIS have different shapes such as trapezoidal, triangular, and Gaussian. Among these functions, the Gaussian MF has received more attention from researchers than others (Gholami et al. 2017, 2018; Moradi et al. 2019). Some of the most important features of this function are its smoothness, low number of parameters (C and sigma) and related concise notation. The general form of this function is as follows:

$$\mu_{{K_{IZ} }} \left( {A_{I} } \right) = \exp \left( {\frac{{ - \left( {A_{i} - c_{iz} } \right)^{2} }}{{2\sigma_{iz}^{2} }}} \right)$$
(3)

In this regard, Kiz is a fuzzy set, μ denotes the MF, A is the input, and {c, σ} are the parameters of the Gaussian MF whose optimal values are calculated through the training process.

Fuzzy rules are defined using the units defined in the first layer of the ANFIS network. Each of these fuzzy rules will be completely connected to all the units provided at the first layer. By defining the variable wzr (r = 1, 2,…, Z), the activation level of each fuzzy rule is obtained. In addition, in this layer, the AND operator and the product function are used to connect all fuzzy rules to the first layer unite. The degree of membership of each of the defined rules is calculated as follows:

$$w_{z} = \prod\limits_{i = 1}^{n} {\mu_{{K_{iz} }} \left( {A_{i} } \right)}$$
(4)

By considering at least one unit for each of the rules defined at the second layer, a complete connection can be made between all the rules in the third layer and the second layer. Using the established connection, the firing strength of each rule is obtained by computing the ratio of the membership degree of each rule to the sum of membership degrees of all rules, as Eq. (5):

$$\overline{w}_{z} = \frac{{w_{z} }}{{w_{1} + w_{2} + ... + w_{Z} }}\,\,\,\,\,z = 1,2,...,Z$$
(5)

The output of all the rules in the fourth layer is completely connected to the previous layer and calculated as follows:

$$O_{z} = \overline{w}_{z} y_{z} = \overline{w}_{r} \left( {q_{0z} + q_{1z} A_{1z} + ... + q_{nz} A_{nz} } \right)$$
(6)

Finally, in the fifth layer, the sum of the output of all the rules computed in the previous layer is calculated and reported as the ANFIS output.

$$y^{q} = \sum\limits_{r = 1}^{Z} {O_{z} }$$
(7)

In order to model real-world problems using ANFIS, we need to consider two important issues: choosing the proper method for the fuzzy inference system (FIS) generation and determining the training algorithm.

Some of the most popular and widely used FIS generation methods are fuzzy C-means clustering (FCM), sub-clustering (SC) and grid partitioning (GP). Recent studies (Safari et al. 2019; Ebtehaj et al. 2019a; 2021) have shown that not only the accuracy of FCM modeling is better than the other two methods, but it also has a higher modeling speed. Moreover, the parameters that must be optimized during the training process in this method are less than the other two methods. In addition, newer versions of MATLAB use only FCM for the FIS generation. Therefore, in this study, FCM is utilized for the FIS generation. Details of the FCM method have been provided by Lotfi et al. (2020).

In order to train ANFIS, there are two classical algorithms, including backpropagation (BP) and hybrid, which is an integration of the BP and least square (LS) (BP-LS) in the provided ANFIS versions, which have been used by many researchers (Azimi et al. 2017; Ebtehaj et al. 2019b; Lotfi et al. 2019), which generally show the superiority of the hybrid algorithm over the BP. One of the biggest problems of the classical training algorithm used in ANIFS is their entrapment in the local extremums (Qasem et al. 2017). Therefore, in this paper, the firefly algorithm (FA) as a well-known evolutionary algorithm in various sciences, especially water and hydrology (Ebtehaj and Bonakdari 2016; Ebtehaj et al. 2017; Mojtahedi et al. 2019) is employed to train ANFIS.

Firefly algorithm (FA)

The FA is an evolutionary algorithm introduced by Yang (2009). Fireflies produce short rhythmic lights so that the pattern of the light production in each of them is different from the other. Fireflies use these lights to attract prey and mates. The light intensity (I) for each of fireflies has an inverse square relationship with the distance r from the light source so that as the distance increases, the light intensity decreases. In order to present the concept of the firefly algorithm based on light intensity and distance, it is first necessary to make the following assumptions:

  1. 1.

    None of fireflies has gender, and each firefly can absorb all others.

  2. 2.

    (2) The attractiveness of each fly is directly related to its brightness, so that for each firefly, the firefly that has less light is attracted to the brighter firefly. The absorption power of each firefly is directly related to the brightness of the firefly, while the relationship between light intensity and the distance between two fireflies is inverse. It should be noted that if the brightness of all fireflies is equal to each other, the movement of fireflies is random.

  3. 3.

    (3) The brightness of fireflies is considered as a target function.

According to the presented assumptions, the formulation of the firefly algorithm is stated below. First, the attractiveness of the firefly (β) is defined. The firefly attractiveness is a relative variable that is related to its absorption power (γ) and distance from others, and is expressed as follows:

$$\beta (r) = \beta \,e^{{ - \gamma \,r^{2} }}$$
(8)

In this respect, β represents attractiveness, β (r) is the value of attractiveness at the distance r from the light source, and γ is the power of absorption.

The distance between two fireflies at the points xi for the firefly i and xj for the firefly j is calculated using Eq. (9):

$$r_{ij} = \left\| {x_{i} - x_{j} } \right\| = \sqrt {\sum\limits_{L = 1}^{d} {(x_{i,L} - x_{j,L} )^{2} } }$$
(9)

In this relation, rij is the distance between the two fireflies i and j, xi is the position of the firefly i and xj denotes the position of the firefly j, xi,L represents the Lth part of the coordinates of the firefly i. Assuming that the firefly i is more attractive than the firefly j, the movement of the firefly i toward the firefly j is calculated as follows:

$$x_{i} = x_{i} + \beta_{0} e^{{ - \gamma r_{ij}^{2} }} \left( {x_{i} - x_{j} } \right) + \alpha \left( {{\text{rand}} - \frac{1}{2}} \right)$$
(10)

The above relationship consists of three general terms so that the first term represents the current position of the firefly i, while the second term is known as the absorption process and finally, the third term is considered as the random term. The third term is defined using two parameters α and rand. The parameter α is a positive number while rand produces a random number in the range [0, 1]. β0 is the amount of attractiveness in the light source.

Hybrid of ANFIS and FA (ANFIS-FA)

In this section, the ANFIS training process using the firefly algorithm (ANFIS-FA) is described in detail. All steps related to the ANFIS training process are performed in MATLAB using the FA as the rainfall prediction. The modeling process in this hybrid method begins with the production of an initial model based on ANFIS so that the parameters related to ANFIS, including antecedent and consequent parameters at this stage have no optimal values (Azimi et al. 2018). For this purpose, the optimization is started using the FA to optimize all values of the antecedent and consequent parameters during the optimization process. It should be noted that the optimization process is performed only using training data, which is a percentage of the total data.The information used in this study is in the period from 1997 to 2020, in which the data of 1977–2008 are employed to train the model and the data for the remaining 12 years (2009–2020) to test the efficiency of the model for unseen data. In addition to determining training and testing data, the method utilized for the FIS generation should also be selected, which, as stated, is the FCM method. Furthermore, the value of the MF must be determined in such a way that not only does the model have a good ability to estimate the amount of rainfall, but also the considerable complexity of the model must be avoided. Hence, the value of this parameter is considered in the range [2, 5]. After specifying the ANFIS parameters, the user-defined parameters of the FA should also be initialized. It should be noted that the careful selection of the parameters of this algorithm has a significant effect on results and if their values are not selected correctly, the optimization algorithm may have adverse impacts on the ANFIS efficiency and instead of increasing the modeling accuracy by this method, it might cause to reduce its exactness. The parameter α, which appears as a movement parameter in the random term of Eq. (10), should be considered in the range [0, 1], while this range is equal to [0.1, 10] for the absorption strength parameter (γ) (Yang 2009). In the present study, the optimal values of the parameters γ, β0 and α are obtained to be 1.2, 2.05, 0.25, respectively.

The ANFISA modeling process continues until the value of the objective function considered as root-mean-square error (RMSE) reaches the desired value or the number of iterations intended for modeling ends. The number of optimal replications obtained in this study is equal to 1700. For more reliability during modeling, the number of iterations is considered equal to 2000. In order to obtain this value, the number of iterations in the range of 1000–3000 is evaluated and by examining the target function value, this value is chosen in such a way that further increase would have no effect on decreasing the RMSE value. After defining the parameters including the number of MFs, the number of iterations and variables of the firefly algorithm and finally the classification of training and testing data, the modeling process begins. Initially, an initial population of fireflies are produced to optimize the antecedent and consequent variables of ANFIS. For the produced generation, light intensity and absorption coefficient of fireflies are calculated. Using the calculated absorption intensity for each firefly, brighter fireflies absorb dimmer ones. In fact, using the intensity of absorption, the movement of fireflies begins [Eq. (10)]. Then, all fireflies are ranked and the best of them are selected as the current best and the process of moving of fireflies continues for the whole defined population, until the stop criterion is reached (number of repetitions or the minimum acceptable RMSE value). At the end of the training process, the efficiency of the developed model for testing data is tested. If the testing data error is acceptable, the modeling is terminated; otherwise, the optimization process should be repeated so that the user-defined parameters should be changed to obtain a better answer.

Wavelet transform

The wavelet transfer function is a mathematical function that decomposes the main function into several functions using a mother function. The wavelet function uses a mathematical relation to break down the main time series into different components, which also removes the limitations presented in the Fourier series (Fallahi 2020). The wavelet function has the ability to extract information from transient and non-periodic signals. Hence, this method is suitable for time frequency localizing (Jawerth and Sweldens 1994).

The basis of the analysis using wavelet transform is the analysis of time series into two parts, low and high frequency. In fact, each input of the problem, which is a time series, is classified into two main groups: approximation (A) and details (D). If the decomposition level is more than one, the original time series is equal to the time series produced in the first level of the decomposition as A1 + D1, while in the second level of the decomposition, only the produced time series A in the first level is decomposed again, i.e. the original series is equal to A2 + D1 + D2. This process continues until the final level of decomposition is achieved.

The attached wavelet function (CWT) for the signal x(t) is written using the wavelet function ψ as follows:

$${\text{CWT}}(a,b) = \int_{ - \infty }^{ + \infty } {x(t)\frac{1}{{\sqrt {\left| a \right|} }}\psi (\frac{t - b}{{\sqrt a }})} {\text{dt}}$$
(11)

In this relation, a is equal to the scale parameter and b denotes the movement parameter. Due to the time consuming calculation of the values of these two parameters for all values of each parameter, the power of two for each of them is considered. Hence, the discrete wavelet transform is defined as follows:

$${\text{DWT}}(j,k) = \frac{1}{{\sqrt {\left| {2^{j} } \right|} }}\int_{ - \infty }^{ + \infty } {x(t)\frac{1}{{\sqrt {\left| a \right|} }}\psi (\frac{{t - 2^{jk} }}{{2^{j} }})} {\text{dt}}$$
(12)

Proposed hybrid wavelet and ANFIS-FA (WANFIS-FA)

For rainfall modeling using the WANFIS-FA, after classifying the data into two categories of training and testing, the most important lags should be identified so that the model input variables can be defined to forecast precipitation amounts in the future. The autocorrelation function (ACF) as the best known method for determining lags is used to determine this lag. The ACF for the data employed in this study is presented in Fig. 2. The figure shows the significant importance of lags 1 and 2 compared to others. In addition, the results show that there is a period in which lag 12 is used. In order to be more confident in modeling, in addition to the mentioned lags, three lags are also considered as model inputs. Finally, the functional equation for calculating rainfall at the time t is written as:

$$R(t) = f\left( {R(t - 1),R(t - 2),R(t - 3),R(t - 12)} \right)$$
(13)
Fig. 2
figure 2

ACF diagram of the Ardebil rainfall

After determining the effective lags, the decomposition level should be evaluated; hence, Eq. (15) is used (Malekzadeh et al. 2019).

$${\text{DL}} = {\text{int}} \left[ {\log \left( N \right)} \right]$$
(14)

where DL represents the decomposition level and N represents the number of training data. The last parameter to be determined is the type of the mother wavelet, which has a meaningful impact on achieving the optimal model. In this study, the functions Coiflet (Coif), Daubechies (db), Symlet (sym), Haar and Meyer (dmey), which have been used in various studies, are examined.

Goodness of fit

In this study, to test the performance of artificial intelligence models used in this study, the statistical indices comprising the correlation coefficient (R), variance accounted for (VAF), mean relative error (MAE), Nash–Sutcliffe efficiency coefficient (NSC), root-mean-square error (RMSE) and normal root-mean-square error (NRMSE) are utilized (Azimi and Shiri 2020):

$$R = \frac{{\sum\nolimits_{i = 1}^{n} {\left( {F_{i} - \overline{F} } \right)\left( {O_{i} - \overline{O} } \right)} }}{{\sqrt {\sum\nolimits_{i = 1}^{n} {\left( {F_{i} - \overline{F} } \right)^{2} \sum\nolimits_{i = 1}^{n} {\left( {O_{i} - \overline{O} } \right)^{2} } } } }}$$
(15)
$${\text{VAF}} = \left( {1 - \frac{{{\text{var}} \left( {F_{i} - O_{i} } \right)}}{{{\text{var}} \left( {F_{i} } \right)}}} \right) \times 100$$
(16)
$${\text{MAE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {F_{i} - O_{i} } \right|}$$
(17)
$${\text{NSC}} = 1 - \frac{{\sum\limits_{i = 1}^{n} {\left( {O_{i} - F_{i} } \right)^{2} } }}{{\sum\limits_{i = 1}^{n} {\left( {O_{i} - \overline{O} } \right)^{2} } }}$$
(18)
$${\text{RMSE}} = \sqrt {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left( {F_{i} - O_{i} } \right)^{2} } }$$
(19)
$${\text{NRMSE}} = \frac{{{\text{NRMSE}}}}{{\overline{O} }}$$
(20)

where

\(O_{i}\): rainfall or runoff values,

\(F_{i}\): modeled values,

\(\overline{O}\): average observed values, and

n: the number of observed values.

In the following, the efficiency of the artificial intelligence models in reproducing rainfall amounts in Ardebil is tested. In this paper, Ardebil rainfall time series data over a 44-year period from 1976 to 2020 are utilized to train and test artificial intelligence models. The mentioned data are classified into two parts: training (70% of the total data) and testing (remaining 30%). Other ratios, e.g., 50% for the training and 50% for the testing, 60% for the training and 40% for the testing, 70% for the training and 30% for the testing, and 80% for the training and 20% for the testing, were also applied in this study. However, the ML models showed a better performance when 70% of the dataset and the rest (30% of data) were used for the training and testing these models. Important findings of this study are provided in the following.

Results and discussion

Normalization

To facilitate dealing with observed precipitation values in the present study, these data are normalized. Equation (21) is utilized to normalize observed precipitation values:

$$x_{n} = a\left[ {\left( {x - x_{\min } } \right)/\left( {x_{\max } - x_{\min } } \right)} \right] + b$$
(21)

where \(x_{n}\) denotes normalized values, \(x\) represents observed values, \(x_{\min }\) represents the minimum observed value, \(x_{\max }\) is the maximum observed value, and a and b are constant values. In this paper, for a and b, three different values such as \(a = 1,\,b = 0\),\(a = 2,\,b = - 1\) and \(a = 0.6,\,b = 0.2\) are chosen and the simulation outcomes for conditions without normalization are also evaluated. The results of the values of the statistical indices obtained for different normalization coefficients and without normalization can be seen in Fig. 3. The results of the simulations showed that the estimations of the correlation coefficient and Nash coefficient for the model without normalization coefficients are equal to 0.840 and 0.676, separately. The NRMSE statistical index for the model with \(a = 1,\,b = 0\) is estimated to be 0.422. In addition, for the model with \(a = 2,\,b = - 1\), RMSE, VAF and NRMSE are obtained to be 34.63, 25.068 and 0.911, separately. However, for the model with \(a = 0.6,\,b = 0.2\), R, MAE and NSC are calculated to be 0.920, 10.337 and 0.785. Therefore, as it is observed, the model with \(a = 0.6,\,b = 0.2\) show the highest correlation and the lowest error rate.

Fig. 3
figure 3

Results of calculated statistical indices for different normalization coefficients and without normalization

Figure 4 also demonstrates the comparison of the simulated rainfall amounts with observed values. According to Fig. 6, this numerical model with \(a = 0.6,\,b = 0.2\) displays acceptable performance. Although this model estimates the precipitation values differently in some areas; however, in general, it is able to forecast the target function values with an reasonable exactness.

Fig. 4
figure 4

Comparison of simulated precipitation values with observed values

Mother wavelet

Wavelet is a small wave whose energy is concentrated in a small area and is a good tool for studying transient phenomena. The wavelet has a minimum oscillation that declines to zero, and this decrease should be limited to the positive and negative degrees within its range. This feature makes the wavelet flexible and allows it to behave like a function. In fact, the wavelet transform is a good way to analyze data and make them without fluctuating. In other words, wavelets are mathematical functions that provide a time-scale expression of time series and their ratios, which is useful for analysis for non-stationary time series. The wavelet transform is able to gain temporal, spatial and frequency information of a signal. In this part, different families of the wavelet transform are examined. In general, the wavelet transform has families called Coif, Haar, db, dmey, and sym. Based on the analysis of all members of the different wavelet transform families, it is found that the dmey owns the highest exactness and the minimum error rate compared to other members of the wavelet family. The results of the statistical indices for the superior members of different wavelet families are given in Fig. 5. For example, for haar, the estimations of R, NSC and NRMSE are 0.580, 0.255 and 0.875, individually. However, for db, RMSE, NSC and MAE are equal to 21.913, 0.606 and 13.887, separately. Moreover, for sym, VAF, R and RMSE are approximated to be 84.484, 0.919 and 15.701, separately. For coif, VAF and correlation coefficient are calculated equal to 84.230 and 0.920, separately. Based on the simulation results of rainfall, among all the different members of the wavelet family, dmey is introduced as the best member of this family so that the estimations of the NSC and RMSE statistical indices for it are estimated to be 0.959 and 7.835, respectively. Also, R, MAE and NRMSE for dmey are equal to 0.980, 5.949 and 0.206, separately. Therefore, based on the results of all members of the wavelet family, dmey is introduced as the most accurate and optimal member of the wavelet family and in following, this member is used to simulate long-term rainfall in Ardebil.

Fig. 5
figure 5

Estimations of statistical indices for the superior members of different wavelet families

Figure 6 compares the precipitation amounts reproduced by dmey and observed values. Based on the simulation findings, dmey computes the values of the target function with the highest exactness and the least error and demonstrates a high correlation with the observed values.

Fig. 6
figure 6

Comparison of rainfall amounts simulated by mother wavelet dmey and observed values

Decomposition level (DL)

In this study, two decomposition levels (DL) are considered for the wavelet model and the amount of precipitation in Ardebil is examined by them. Figure 7 provides the values of different statistical indices computed to estimate the amount of precipitation by these two decomposition levels. The R, NRMSE and MAE values for DL 1 are calculated to be 0.968, 0.261 and 7.553, respectively. Meanwhile, VAF, NSC and RMSE for DL 2 are obtained to be 96.119, 0.959 and 7.835, respectively.

Fig. 7
figure 7

Results of different statistical indices calculated to estimate precipitation values by DL 1 and DL 2

As shown, DL 2 estimates the precipitation values of Ardebil with better accuracy and less error. Figure 8 also provides the comparison between the rainfall amounts simulated by ِDL 2 and the observed values. Also, DL 2 can recreate the target parameter values with reasonable efficiency.

Fig. 8
figure 8

Comparison of rainfall amounts simulated by DL 2 with observed values

Membership function (MF)

In this section, the effect of the number of ANFIS model MFs on the modeling results is examined. Initially, the number of the MFs is assumed to be two, and by increasing it, the ANFIS efficiency is examined via different statistical indices. For example, for situations where the number of MFs is equal to two, the estimations of R, RMSE, and MAE are estimated to be 0.980, 7.835, and 5.949, separately. In addition, the NRMSE and NSC statistical indices for this number of MFs are estimated to be 0.206 and 0.959, respectively. Moreover, with increasing the number of the MFs, the accuracy of the ANFIS model did not increase significantly. For example, when the number of the MFs is considered equal to 5, VAF, MAE and NSC indices are computed to be 98.537, 3.391 and 0.985, respectively. Hence, as seen, increasing MF value has no meaningful influence on the simulation efficiency and only increases the computational costs. So, the number of MFs for this paper is chosen to be equal to 4. It should be noted that the comparison of different statistical indicators for the number of the MFs from two to five is shown in Fig. 9.

Fig. 9
figure 9

Comparison of different statistical indices for the number of MFs from two to five

Figure 10 demonstrates a comparison between the precipitations simulated by the optimal value of MF = 4 and the observed precipitations of Ardabil. According to the simulation results, the ANFIS model with four MFs displays an acceptable performance in estimating precipitation amount. This model, in a situation where the number of the optimal MFs is four, obtains the estimations of the target function with the highest correlation and the lowest error value.

Fig. 10
figure 10

Comparison between precipitation values simulated by optimal MFs equal to four and observed precipitation values in Ardebil

WANFIS-FA models

In this part, the results obtained from the WANFIS-FA models are examined. As mentioned before, these models are produced by combining the ANFIS and the wavelet transform and the firefly algorithm. Figure 11 shows a comparison of different statistical indices for all WANFIS-FA hybrid models. In this study, fourteen WANFIS-FA models are developed using influencing time series data lags. Based on the modeling findings, the VAF and correlation coefficient for the WANFIS-FA 1 model are calculated equal to 44.890 and 0.670, respectively. As observed, the wavelet transform and firefly algorithm meaningfully improves the ANFIS efficiency. Meanwhile, the RMSE, MAE and VAF statistical indices for WANFIS-FA 1 are calculated as 17.765, 12.651 and 44.890, separately. Moreover, NSC, VAF and RMSE for WANFIS-FA 2 are obtained to be − 1.813, 24.218 and 20.890, respectively. Also, MAE, NRMSE and R for the WANFIS-FA 3 model are obtained as 14.999, 0.785 and 0.587, individually. The correlation coefficient and VAF for WANFIS-FA 4 are approximated as 0.674 and 45.161, separately. Moreover, R, RMSE and NSC for WANFIS-FA 5 are obtained as 0.963, 10.681 and 0.920, respectively. Also, for WANFIS 6, R, MAE and VAF are yielded as 0.946, 9.010 and 89.463, separately. In contrast, for WANFIS 7, the estimations of NSC, MAE and R are approximately equal to 0.614, 14.434 and 0.853, individually. For the WANFIS-FA 8, WANFIS-FA 9 and WANFIS-FA 10 models, the correlation coefficients are yielded to be 0.886, 0.822 and 0.832, respectively. The VAF for WANFIS-FA 11, WANFIS-FA 12 and WANFIS-FA 13 are also computed equal to 97.496, 97.133 and 97.430, separately.

Fig. 11
figure 11

Comparison of different statistical indices for all WANFIS-FA hybrid models

According to the modeling outcomes, among all hybrid models, WANFIS-FA 14 is chosen as the best AI model in this research. For instance, the VAF and correlation coefficient for WANFIS-FA 14 are calculated equal to 98.082 and 0.990, respectively. Furthermore, for this model, RMSE, NSC and R are equal to 5.515, 0.980 and 0.990, separately.

Therefore, among all AI models, WANFIS-FA 14 shows the minimum error value and the maximum correlation with observed precipitation values. Furthermore, the lags (t − 1), (t − 2), (t − 3) and (t − 12) are identified as the most influencing lags in reproducing long-term rainfall in Ardabil by the WANFIS-FA14 model. Figure 12 shows the scatter plot and comparison of the numerical and observed values of the superior model (WANFIS-FA14).

Fig. 12
figure 12

Scatter plot and comparison of numerical and observed values of superior model (WANFIS-FA14)

Comparison with ANFIS, ANFIS-FA, and WANFIS

In this step, the superior model (WANFIS-FA 14) efficiency is compared with the ANFIS, ANFIS-FA, and WANFIS models. Figure 13 shows the comparison of the calculated statistical indices for the ANFIS, ANFIS-FA, WANFIS and WANFIS-FA models. According to the simulation outcomes, VAF and RMSE for ANFIS, respectively, are obtained to be 38.294 and 31.579 and these statistical indices for the ANFIS-FA model are equal to 42.858 and 0.798, individually. However, for WANFIS, the R, MAE and NSC indices are estimated as 0.971, 7.299 and 0.936, respectively. As can be seen, the WANFIS model has higher accuracy than ANFIS-FA and the ANFIS model has the worst performance in estimating the precipitation values.

Fig. 13
figure 13

Comparison of statistical indices calculated for ANFIS, ANFIS-FA, WANFIS and WANFIS-FA models

Comparison of the ability of the WANFIS-FA model in simulating precipitation values in Ardabil with the ANFIS, ANFIS-FA, and WANFIS models is presented in Fig. 14. According to the results, the WANFIS-FA hybrid model approximates precipitation values with high accuracy. This model displays the lowest error rate and the highest correlation with the observed values.

Fig. 14
figure 14

Comparison of WANFIS-FA model capability in simulating Ardabil rainfall values with ANFIS, ANFIS-FA, and WANFIS models

In the current research, a hybrid fuzzy model is extended to reproduce long-term precipitation in Ardabil. The model estimates the target parameter values with appropriate exactness. Furthermore, by analyzing the simulation results, the most effective time series data lags are introduced. Although significant results have been achieved in the current study, there were some limitations, e.g., lack of data before 1976. Moreover, the ML mode was not able to provide an explicit relationship to estimate the target value and this was another drawback of current work.

Conclusion

Rainfall simulation is very important in Iran because, in recent years, the pattern of rainfall in various parts of the arid and semi-arid areas has undergone significant changes. Also, in the northern regions of Iran, including the northern provinces, rainfall is the most important source of water supply for different demands. In this study, as the first time, a new optimized neuro-fuzzy model was introduced, which was built by integrating ANFIS, wavelet transform and firefly algorithm. Then, the long-term rainfall of Ardabil in a long-term period of 44 years from 1976 to 2020 was estimated by this model. To do this, initially, the data were classified into two categories: training (32 years) and testing (12 years). Next, the influencing lags of the time series data were detected using an autocorrelation function. Also, different members of the mother wavelets were examined and in the end, the most optimal one was selected. Using these lags and optimized mother wavelet, 14 different WANFIS-FA models were developed. By evaluating the modeling outcomes, the best models as well as the most influencing input lags were detected. For the best model, the estimations of the correlation coefficient and VAF were yielded to be 0.99 and 98.08, respectively. The findings of the sensitivity analysis demonstrated that the lags (t − 1), (t − 2), (t − 3) and (t − 12) were introduced as the most influencing time series data lags. In addition, examinations have shown that the wavelet transform significantly promoted the performance of the firefly algorithm, and this hybrid model simulated long-term rainfall in Ardabil with remarkable accuracy. Other optimization algorithms can be used in the next works. Additionally, the current methodology may be applied for other cities in the area.