Novel reliable model by integrating the adaptive neuro-fuzzy inference systems with wavelet transform and firefly algorithms for rainfall forecasting in the north of Iran

Rainfall is perhaps the most important source of drinking and agriculture water for the inhabitants of different parts of the world, particularly in arid and semi-arid area like Iran. Hence, the simulation of this hydrological phenomenon is crucial. The current research attempts to reproduce the long-term monthly precipitation of Ardabil, Iran, during 44 years from 1976 to 2020 for the first time via a hybrid fuzzy technique. For developing this model (WANFIS-FA), adaptive neuro-fuzzy inference system (ANFIS), firefly algorithm and wavelet transform were integrated. Firstly, the impacting lags of time series data were recognized by using the autocorrelation function and 14 WANFIS-FA models were defined using them. Then, the results of WANFIS-FA models were evaluated and the best WANFIS-FA model and the most influencing lags were found. For example, the variance accounted for index (VAF), correlation coefficient (R) and Nash–Sutcliffe coefficient (NSC) values for the superior WANFIS-FA model were computed to be 98.082, 0.990 and 0.980, respectively. In addition, the lags (t − 1), (t − 2), (t − 3) and (t − 12) were the most effective ones. Next, different members of the mother wavelet were tested and finally demy was selected as an optimal wavelet. Also, the analysis of the outcomes of the hybrid models demonstrated that the wavelet transform meaningfully enhanced the efficiency of the neuro-fuzzy model. Finally, the efficiency of WANFIS-FA was compared with ANFIS, WANFIS and ANFIS-FA, which displayed that WANFIS-FA performed better.


Introduction
Precipitation is the process by which water vapor condenses under atmospheric conditions and precipitates in liquid or solid form due to the force of gravity. In terms of being located on the belt of arid regions of the world, Iran is mostly surrounded by arid and semi-arid areas, and the lack of rainfall in these regions makes it difficult to establish vegetation. The amount of rainfall on the country's watersheds is less than one third of the world average. The estimation of precipitation is necessary for the implementation of water resources studies, drought, land management, environment, watershed management and comprehensive agricultural projects. Today, there are various methods such as numerical and intelligent models to simulate the phenomenon of precipitation. Machine learning (ML), metaheuristic models, optimization algorithms, and neuro-fuzzy methods are among the most important useful tools for modeling various nonlinear and complex phenomena in recent years (Khoshbin et al. 2016;Akhbari et al. 2017;Shabanlou et al. 2018;Azimi and Shiri 2021;Azimi et al. 2022). Artificial intelligence (AI)-based models have been broadly implemented in different fields to model several linear and nonlinear problems because they are quick, inexpensive, reliable and precise.
These models have recently been used to model hydrological phenomena such as precipitation. For example, Kisi and Shiri (2011) simulated the daily precipitation of the Izmir station in Turkey by combining neuro-fuzzy models, gene expression programming and wavelet function of two hybrid models. Mekanik et al. (2013) estimated long-term rainfall over a 110-year period within a region 1 3 46 Page 2 of 16 of South Australia using multiple linear regression models and artificial neural networks (ANNs). The researchers reviewed the results of the study and concluded that the ANN model performed better. Subsequently, Abbot and Marohasy (2014) predicted monthly rainfall in Queensland, Australia, by selecting inputs and optimizing them using ANNs. They concluded that predictions done by the ANN are more accurate than common models. Akrami et al. (2014) by employing ANFIS, ANN and wavelet transform, presented different hybrid numerical models to simulate rainfall in the Klang River Basin in Malaysia. In their study, it was shown that the results of wavelet-ANFIS and wavelet-ANN models predicted precipitation values with better accuracy. Shafaei et al. (2016) estimated rainfall data of the 40-year time series of the city of Nahavand by a hybrid model.
They integrated the ANN, SARIMA, and wavelet transform models. They showed that the model developed from the combination of these three algorithms simulated the values of the objective function more accurately. In another research conducted by Phukoetphim et al. (2016), the ANN and symbolic regression were used to integrate the computed discharges of rainfall-runoff models in several basins of Thailand and New Zealand. Their findings showed that multi-model methods for flow estimation depend on the type and characteristics of the basin. Sarzaeim et al. (2017), in order to predict runoff under climate change conditions, implemented AI approaches such as genetic algorithm and ANN and support vector machine in the Aidonmush basin in Iran. They concluded that the SVM method displayed a better performance than the others. Then, Purnomo et al. (2017) estimated the amount of rainfall in the period from 2001 to 2013 in the center of Java, Indonesia by two different ANN models. They showed that the neural models estimated the amount of rainfall with acceptable accuracy. Also, Yaseen et al. (2018) using the firefly algorithm managed to optimize ANFIS to estimate rainfall in one of the areas of Malaysia. They indicated that the hybrid model estimated rainfall values more accurately than ANFIS. Danladi et al. (2018) utilized ANFIS to estimate precipitation changes over a short period of time. They considered rainfall amounts as a function of temperature and relative humidity and modeled them by employing these two variables. Diez-Sierra and del Jesus (2020) estimated daily rainfall amounts in a semiarid region in Spain using regression models, the ANN and SVM. By testing the simulation results, the authors showed that the ANN model performed better.
More recently, Endalie et al. (2022) used a forecasting model based on long short-term memory (LSTM) capable of forecasting Jimma's daily rainfall. They also compared this machine learning with other methods such as multilayer perceptron (MLP), k-nearest neighbors (KNN), support vector machine (SVM), and decision tree (DT).
Studying previous researches reveals that the combined model of ANFIS network, firefly algorithm and wavelet model has not been used to simulate long-term rainfall. Therefore, in the present study, the ANFIS, firefly and wavelet models are integrated and the long-term rainfall of Ardabil is simulated on a monthly basis over a 44-year period from 1976 to 2020 for the first time. First, via the autocorrelation function, the impacting lags of the time series data are identified and using them, the rainfall values are estimated. In the following, by performing several different analyzes, the best model in computing rainfall amounts is introduced.

Study area
The city of Ardabil is situated in the middle of Ardabil Plain; this city is situated at an altitude of 1500 m above sea level and between the Talesh and Sabalan mountains in the northwest of the Iranian plateau. The Mediterranean flow as the main source of rainfall along with moderating the temperature and humidity enters from the west. The air flow of Siberia in Central Asia enters from the northeast with a cold nature, and after passing the Caspian Sea and absorbing water vapor of Ardabil in summer, it causes a sharp reduction in heat and cooling of the air. In the region, there are two types of local winds and front winds that start with the penetration of the front toward the region. Rainfall occurs in all seasons, but it is more intense in spring and autumn. Based on the Ardabil meteorological station data, the average rainfall is 327.7 mm. This area has very cold winters and mild summers with an average temperature of about seven degrees Celsius. Figure 1 shows the study area.
In this paper, a hybrid method is presented for precipitation forecasting in Ardabil. The proposed hybrid method is on the basis of the ANFIS which is optimized using the wavelet transform and firefly algorithm. In the following, the details of each method, along with how to combine them, are given in detail.

ANFIS
Fuzzy systems are one of the relatively new and popular methods in modeling dynamic and complex phenomena. The theory of fuzzy systems based on linguistic expression was proposed in 1965 by Zadeh (Zadeh 1965). On the other hand, classical neural networks have been considered by many researchers in recent decades as a powerful tool in modeling real problems. Jang (1993), by integrating the ability of neural network in training and using linguistic expression presented in fuzzy systems, introduced the ANFIS so that the ability of training in neural network methods to increase the modeling feature provided in fuzzy systems by Zadeh (1965) is employed.
The ANFIS network has two main parts called "antecedent" and "consequent". The antecedent part uses Takagi-Sugeno Fuzzy systems. The general structure defined for the rules in this part is in the form of Eq. (1): where K iz (i = 1, 2,…, n) is known as a fuzzy set, so that each of these fuzzy sets is used as a linguistic expression to provide a linguistic explanation of each input. The ANFIS (1) If A 1 is K iz , ... , and A n is K nz Then B z = q 0 + q 1z A 1 + ... + q nz A n network defined by Jang (1993) has 5 general layers, each of which is presented below with its details.
Layer 1: The nodes provided in the first layer are considered as the degree of membership of each of the inputs of the problem, which are defined as Eq. (2).
where A denotes the input of the ith node, K iz is the fuzzy set associated with the input, and μ is the membership function (MF). MFs that can be employed in ANFIS have different shapes such as trapezoidal, triangular, and Gaussian. Among these functions, the Gaussian MF has received more (2) O I,1 = K iz (A) Fig. 1 The location of Ardabil city, Iran, as the study area in this research attention from researchers than others (Gholami et al. 2017(Gholami et al. , 2018Moradi et al. 2019). Some of the most important features of this function are its smoothness, low number of parameters (C and sigma) and related concise notation. The general form of this function is as follows: In this regard, K iz is a fuzzy set, μ denotes the MF, A is the input, and {c, σ} are the parameters of the Gaussian MF whose optimal values are calculated through the training process.
Fuzzy rules are defined using the units defined in the first layer of the ANFIS network. Each of these fuzzy rules will be completely connected to all the units provided at the first layer. By defining the variable wz r (r = 1, 2,…, Z), the activation level of each fuzzy rule is obtained. In addition, in this layer, the AND operator and the product function are used to connect all fuzzy rules to the first layer unite. The degree of membership of each of the defined rules is calculated as follows: By considering at least one unit for each of the rules defined at the second layer, a complete connection can be made between all the rules in the third layer and the second layer. Using the established connection, the firing strength of each rule is obtained by computing the ratio of the membership degree of each rule to the sum of membership degrees of all rules, as Eq. (5): The output of all the rules in the fourth layer is completely connected to the previous layer and calculated as follows: Finally, in the fifth layer, the sum of the output of all the rules computed in the previous layer is calculated and reported as the ANFIS output.
In order to model real-world problems using ANFIS, we need to consider two important issues: choosing the proper method for the fuzzy inference system (FIS) generation and determining the training algorithm. (3) Some of the most popular and widely used FIS generation methods are fuzzy C-means clustering (FCM), sub-clustering (SC) and grid partitioning (GP). Recent studies (Safari et al. 2019;Ebtehaj et al. 2019a; have shown that not only the accuracy of FCM modeling is better than the other two methods, but it also has a higher modeling speed. Moreover, the parameters that must be optimized during the training process in this method are less than the other two methods. In addition, newer versions of MATLAB use only FCM for the FIS generation. Therefore, in this study, FCM is utilized for the FIS generation. Details of the FCM method have been provided by Lotfi et al. (2020).
In order to train ANFIS, there are two classical algorithms, including backpropagation (BP) and hybrid, which is an integration of the BP and least square (LS) (BP-LS) in the provided ANFIS versions, which have been used by many researchers Ebtehaj et al. 2019b;Lotfi et al. 2019), which generally show the superiority of the hybrid algorithm over the BP. One of the biggest problems of the classical training algorithm used in ANIFS is their entrapment in the local extremums (Qasem et al. 2017). Therefore, in this paper, the firefly algorithm (FA) as a wellknown evolutionary algorithm in various sciences, especially water and hydrology Ebtehaj et al. 2017;Mojtahedi et al. 2019) is employed to train ANFIS.

Firefly algorithm (FA)
The FA is an evolutionary algorithm introduced by Yang (2009). Fireflies produce short rhythmic lights so that the pattern of the light production in each of them is different from the other. Fireflies use these lights to attract prey and mates. The light intensity (I) for each of fireflies has an inverse square relationship with the distance r from the light source so that as the distance increases, the light intensity decreases. In order to present the concept of the firefly algorithm based on light intensity and distance, it is first necessary to make the following assumptions: 1. None of fireflies has gender, and each firefly can absorb all others. 2.
(2) The attractiveness of each fly is directly related to its brightness, so that for each firefly, the firefly that has less light is attracted to the brighter firefly. The absorption power of each firefly is directly related to the brightness of the firefly, while the relationship between light intensity and the distance between two fireflies is inverse. It should be noted that if the brightness of all fireflies is equal to each other, the movement of fireflies is random. 3. (3) The brightness of fireflies is considered as a target function.
According to the presented assumptions, the formulation of the firefly algorithm is stated below. First, the attractiveness of the firefly (β) is defined. The firefly attractiveness is a relative variable that is related to its absorption power (γ) and distance from others, and is expressed as follows: In this respect, β represents attractiveness, β (r) is the value of attractiveness at the distance r from the light source, and γ is the power of absorption.
The distance between two fireflies at the points x i for the firefly i and x j for the firefly j is calculated using Eq. (9): In this relation, r ij is the distance between the two fireflies i and j, x i is the position of the firefly i and x j denotes the position of the firefly j, x i,L represents the Lth part of the coordinates of the firefly i. Assuming that the firefly i is more attractive than the firefly j, the movement of the firefly i toward the firefly j is calculated as follows: The above relationship consists of three general terms so that the first term represents the current position of the firefly i, while the second term is known as the absorption process and finally, the third term is considered as the random term. The third term is defined using two parameters α and rand. The parameter α is a positive number while rand produces a random number in the range [0, 1]. β 0 is the amount of attractiveness in the light source.

Hybrid of ANFIS and FA (ANFIS-FA)
In this section, the ANFIS training process using the firefly algorithm (ANFIS-FA) is described in detail. All steps related to the ANFIS training process are performed in MATLAB using the FA as the rainfall prediction. The modeling process in this hybrid method begins with the production of an initial model based on ANFIS so that the parameters related to ANFIS, including antecedent and consequent parameters at this stage have no optimal values . For this purpose, the optimization is started using the FA to optimize all values of the antecedent and consequent parameters during the optimization process. It should be noted that the optimization process is performed only using training data, which is a percentage of the total data. The information used in this study is in the period from 1997 to 2020, in which the data of 1977-2008 are employed to train the model and the data for the remaining 12 years (2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020) to test the efficiency of the model for unseen data. In addition to determining training and testing data, the method utilized for the FIS generation should also be selected, which, as stated, is the FCM method. Furthermore, the value of the MF must be determined in such a way that not only does the model have a good ability to estimate the amount of rainfall, but also the considerable complexity of the model must be avoided. Hence, the value of this parameter is considered in the range [2,5]. After specifying the ANFIS parameters, the user-defined parameters of the FA should also be initialized. It should be noted that the careful selection of the parameters of this algorithm has a significant effect on results and if their values are not selected correctly, the optimization algorithm may have adverse impacts on the ANFIS efficiency and instead of increasing the modeling accuracy by this method, it might cause to reduce its exactness. The parameter α, which appears as a movement parameter in the random term of Eq. (10), should be considered in the range [0, 1], while this range is equal to [0.1, 10] for the absorption strength parameter (γ) (Yang 2009). In the present study, the optimal values of the parameters γ, β 0 and α are obtained to be 1.2, 2.05, 0.25, respectively. The ANFISA modeling process continues until the value of the objective function considered as root-mean-square error (RMSE) reaches the desired value or the number of iterations intended for modeling ends. The number of optimal replications obtained in this study is equal to 1700. For more reliability during modeling, the number of iterations is considered equal to 2000. In order to obtain this value, the number of iterations in the range of 1000-3000 is evaluated and by examining the target function value, this value is chosen in such a way that further increase would have no effect on decreasing the RMSE value. After defining the parameters including the number of MFs, the number of iterations and variables of the firefly algorithm and finally the classification of training and testing data, the modeling process begins. Initially, an initial population of fireflies are produced to optimize the antecedent and consequent variables of ANFIS. For the produced generation, light intensity and absorption coefficient of fireflies are calculated. Using the calculated absorption intensity for each firefly, brighter fireflies absorb dimmer ones. In fact, using the intensity of absorption, the movement of fireflies begins [Eq. (10)]. Then, all fireflies are ranked and the best of them are selected as the current best and the process of moving of fireflies continues for the whole defined population, until the stop criterion is reached (number of repetitions or the minimum acceptable RMSE value). At the end of the training process, the efficiency of the developed model for testing data is tested. If the testing data error is acceptable, the modeling is terminated; otherwise, the optimization process should be repeated so that the user-defined parameters should be changed to obtain a better answer.

Wavelet transform
The wavelet transfer function is a mathematical function that decomposes the main function into several functions using a mother function. The wavelet function uses a mathematical relation to break down the main time series into different components, which also removes the limitations presented in the Fourier series (Fallahi 2020). The wavelet function has the ability to extract information from transient and nonperiodic signals. Hence, this method is suitable for time frequency localizing (Jawerth and Sweldens 1994).
The basis of the analysis using wavelet transform is the analysis of time series into two parts, low and high frequency. In fact, each input of the problem, which is a time series, is classified into two main groups: approximation (A) and details (D). If the decomposition level is more than one, the original time series is equal to the time series produced in the first level of the decomposition as A1 + D1, while in the second level of the decomposition, only the produced time series A in the first level is decomposed again, i.e. the original series is equal to A2 + D1 + D2. This process continues until the final level of decomposition is achieved.
The attached wavelet function (CWT) for the signal x(t) is written using the wavelet function ψ as follows: In this relation, a is equal to the scale parameter and b denotes the movement parameter. Due to the time consuming calculation of the values of these two parameters for all values of each parameter, the power of two for each of them is considered. Hence, the discrete wavelet transform is defined as follows:

Proposed hybrid wavelet and ANFIS-FA (WANFIS-FA)
For rainfall modeling using the WANFIS-FA, after classifying the data into two categories of training and testing, the most important lags should be identified so that the model input variables can be defined to forecast precipitation amounts in the future. The autocorrelation function (ACF) as the best known method for determining lags is used to determine this lag. The ACF for the data employed in this study is presented in Fig. 2. The figure shows the significant importance of lags 1 and 2 compared to others. In addition, the results show that there is a period in which lag 12 is used. In order to be more confident in modeling, in addition to the mentioned lags, three lags are also considered as model inputs. Finally, the functional equation for calculating rainfall at the time t is written as: After determining the effective lags, the decomposition level should be evaluated; hence, Eq. (15)  where DL represents the decomposition level and N represents the number of training data. The last parameter to be determined is the type of the mother wavelet, which has a meaningful impact on achieving the optimal model. In this study, the functions Coiflet (Coif), Daubechies (db), Symlet (sym), Haar and Meyer (dmey), which have been used in various studies, are examined.

Goodness of fit
In this study, to test the performance of artificial intelligence models used in this study, the statistical indices comprising the correlation coefficient (R), variance accounted for (VAF), mean relative error (MAE), Nash-Sutcliffe efficiency coefficient (NSC), root-meansquare error (RMSE) and normal root-mean-square error (NRMSE) are utilized (Azimi and Shiri 2020): In the following, the efficiency of the artificial intelligence models in reproducing rainfall amounts in Ardebil is tested. In this paper, Ardebil rainfall time series data over a 44-year period from 1976 to 2020 are utilized to train and test artificial intelligence models. The mentioned data are classified into two parts: training (70% of the total data) and testing (remaining 30%). Other ratios, e.g., 50% for the training and 50% for the testing, 60% for the training and 40% for the testing, 70% for the training and 30% for the testing, and 80% for the training and 20% for the testing, were also applied in this study. However, the ML models showed a better performance when 70% of the dataset and the rest (30% of data) were used for the training and testing these models. Important findings of this study are provided in the following.

Normalization
To facilitate dealing with observed precipitation values in the present study, these data are normalized. Equation (21) is utilized to normalize observed precipitation values: where x n denotes normalized values, x represents observed values, x min represents the minimum observed value, x max is the maximum observed value, and a and b are constant values. In this paper, for a and b, three different values such as a = 1, b = 0,a = 2, b = −1 and a = 0.6, b = 0.2 are chosen and the simulation outcomes for conditions without normalization are also evaluated. The results of the values of the statistical indices obtained for different normalization coefficients and without normalization can be seen in Fig. 3. The results of the simulations showed that the estimations of the correlation coefficient and Nash coefficient for the model without normalization coefficients are equal to 0.840 and 0.676, separately. The NRMSE statistical index for the model with a = 1, b = 0 is estimated to be 0.422. In addition, for the model with a = 2, b = −1 , RMSE, VAF and NRMSE are obtained to be 34.63, 25.068 and 0.911, separately. However, for the model with a = 0.6, b = 0.2 , R, MAE and NSC are calculated to be 0.920, 10.337 and 0.785. Therefore, as it is observed, the model with a = 0.6, b = 0.2 show the highest correlation and the lowest error rate. Figure 4 also demonstrates the comparison of the simulated rainfall amounts with observed values. According to Fig. 6, this numerical model with a = 0.6, b = 0.2 displays acceptable performance. Although this model estimates the precipitation values differently in some areas; however, in general, it is able to forecast the target function values with an reasonable exactness.

Mother wavelet
Wavelet is a small wave whose energy is concentrated in a small area and is a good tool for studying transient phenomena. The wavelet has a minimum oscillation that declines to zero, and this decrease should be limited to the positive and negative degrees within its range. This feature makes the wavelet flexible and allows it to behave like a function. In fact, the wavelet transform is a good way to analyze data and make them without fluctuating. In other words, wavelets are mathematical functions that provide a time-scale expression of time series and their ratios, which is useful for analysis for non-stationary time series. The wavelet transform is able to gain temporal, spatial and frequency information of a signal. In this part, different families of the wavelet transform are examined. In general, the wavelet transform has families called Coif, Haar, db, dmey, and sym. Based on the analysis of all members of the different wavelet transform families, it is found that the dmey owns the highest exactness and the minimum error rate compared to other members of the wavelet family. The results of the statistical indices for the superior members of different wavelet families are given in Fig 21.913, 0.606 and 13.887, separately. Moreover, for sym, VAF, R and RMSE are approximated to be 84.484, 0.919 and 15.701, separately. For coif, VAF and correlation coefficient are calculated equal to 84.230 and 0.920, separately. Based on the simulation results of rainfall, among all the different members of the wavelet family, dmey is introduced as the best member of this family so that the estimations of the NSC and RMSE statistical indices for it are estimated to be 0.959 and 7.835, respectively. Also, R, MAE and NRMSE for dmey are equal to 0.980, 5.949 and 0.206, separately. Therefore, based on the results of all members of the wavelet family, dmey is introduced as the most accurate and optimal member of the wavelet family and in following, this member is used to simulate long-term rainfall in Ardebil. Figure 6 compares the precipitation amounts reproduced by dmey and observed values. Based on the simulation findings, dmey computes the values of the target function with the highest exactness and the least error and demonstrates a high correlation with the observed values.

Decomposition level (DL)
In this study, two decomposition levels (DL) are considered for the wavelet model and the amount of precipitation in Ardebil is examined by them. Figure 7 provides the values of different statistical indices computed to estimate the amount of precipitation by these two decomposition levels. The R, NRMSE and MAE values for DL 1 are calculated to be 0.968, 0.261 and 7.553, respectively. Meanwhile, VAF, NSC and RMSE for DL 2 are obtained to be 96.119, 0.959 and 7.835, respectively.
As shown, DL 2 estimates the precipitation values of Ardebil with better accuracy and less error. Figure 8 also provides the comparison between the rainfall amounts simulated by ِ DL 2 and the observed values. Also, DL 2 can recreate the target parameter values with reasonable efficiency.

Membership function (MF)
In this section, the effect of the number of ANFIS model MFs on the modeling results is examined. Initially, the number of the MFs is assumed to be two, and by increasing it, the ANFIS efficiency is examined via different statistical indices. For example, for situations where the number of MFs is equal to two, the estimations of R, RMSE, and MAE are estimated to be 0.980, 7.835, and 5.949, separately. In addition, the NRMSE and NSC statistical indices for this number of MFs are estimated to be 0.206 and 0.959, respectively. Moreover, with increasing the number of the MFs, the accuracy of the ANFIS model did not increase significantly. For example, when the number of the MFs is considered equal to 5, VAF, MAE and NSC indices are computed to be 98.537, 3.391 and 0.985, respectively. Hence, as seen, increasing MF value has no meaningful influence on the simulation efficiency and only increases the computational costs. So, the number of MFs for this paper is chosen to be equal to 4. It should be noted that the comparison of different statistical indicators for the number of the MFs from two to five is shown in Fig. 9. Figure 10 demonstrates a comparison between the precipitations simulated by the optimal value of MF = 4 and the observed precipitations of Ardabil. According to the simulation results, the ANFIS model with four MFs displays an acceptable performance in estimating precipitation amount. This model, in a situation where the number of the optimal MFs is four, obtains the estimations of the target function with the highest correlation and the lowest error value.

WANFIS-FA models
In this part, the results obtained from the WANFIS-FA models are examined. As mentioned before, these models are produced by combining the ANFIS and the wavelet transform and the firefly algorithm. Figure 11 shows a comparison of different statistical indices for all WANFIS-FA hybrid models. In this study, fourteen WANFIS-FA models are developed using influencing time series data lags. Based on the modeling findings, the VAF and correlation coefficient for the WANFIS-FA 1 model are calculated equal to 44.890 and 0.670, respectively. As observed, the wavelet transform and firefly algorithm meaningfully improves the ANFIS efficiency.  Therefore, among all AI models, WANFIS-FA 14 shows the minimum error value and the maximum correlation with observed precipitation values. Furthermore, the lags (t − 1), (t − 2), (t − 3) and (t − 12) are identified as the most influencing lags in reproducing long-term rainfall in Ardabil by the WANFIS-FA14 model. Figure 12 shows the scatter plot and comparison of the numerical and observed values of the superior model (WANFIS-FA14).

Comparison with ANFIS, ANFIS-FA, and WANFIS
In this step, the superior model (WANFIS-FA 14) efficiency is compared with the ANFIS, ANFIS-FA, and WANFIS models. Figure 13 shows the comparison of the calculated statistical indices for the ANFIS, ANFIS-FA, WANFIS and WANFIS-FA models. According to the simulation outcomes, VAF and RMSE for ANFIS, respectively, are obtained to be 38.294 and 31.579 and these statistical indices for the ANFIS-FA model are equal to 42.858 and 0.798, individually. However, for WANFIS, the R, MAE and NSC indices are estimated as 0.971, 7.299 and 0.936, respectively. As can be seen, the WANFIS model has higher accuracy than ANFIS-FA and the ANFIS model has the worst performance in estimating the precipitation values.  In the current research, a hybrid fuzzy model is extended to reproduce long-term precipitation in Ardabil. The model estimates the target parameter values with appropriate exactness. Furthermore, by analyzing the simulation results, the most effective time series data lags are introduced. Although significant results have been achieved in the current study, there were some limitations, e.g., lack of data before 1976. Moreover, the ML mode was not able to provide an explicit relationship to estimate the target value and this was another drawback of current work.

Conclusion
Rainfall simulation is very important in Iran because, in recent years, the pattern of rainfall in various parts of the arid and semi-arid areas has undergone significant changes. Also, in the northern regions of Iran, including the northern provinces, rainfall is the most important source of water supply for different demands. In this study, as the first time, a new optimized neuro-fuzzy model was introduced, which was built by integrating ANFIS, wavelet transform and firefly algorithm. Then, the long-term rainfall of Ardabil in a long-term period of 44 years from 1976 to 2020 was estimated by this model. To do this, initially, the data were classified into two categories: training (32 years) and testing (12 years). Next, the influencing lags of the time series data were detected using an autocorrelation function. Also, different members of the mother wavelets were examined and in the end, the most optimal one was selected. Using these lags and optimized mother wavelet, 14 different WANFIS-FA models were developed. By evaluating the modeling outcomes, the best models as well as the most influencing input lags were detected. For the best model, the estimations of the correlation coefficient and VAF were yielded to be 0.99 and 98.08, respectively. The findings of the sensitivity analysis demonstrated that the lags (t − 1), (t − 2), (t − 3) and (t − 12) were introduced as the most influencing time series data lags. In addition, examinations have shown that the wavelet transform significantly promoted the performance of the firefly algorithm, and this hybrid model simulated long-term rainfall in Ardabil with remarkable accuracy. Other optimization algorithms can be used in the next works. Additionally, the current methodology may be applied for other cities in the area.
Funding The authors received no specific funding for this work.

Declarations
Conflict of interest All authors declare they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.