Abstract
Flooding is recognized worldwide joined of the most expensive natural hazards. To adopt proper structural and nonstructural measurements for controlling and mitigating the rising flood risk, the availability of streamflow values along a river is essential. This raises concerns in the hydrological assessment of poorly gauged or ungauged catchments. In this regard, several flood frequency analysis approaches have been conducted in the literature including index flow method (IFM), square grids method (SGM), hybrid method (HM), as well as the conventional multivariate regression method (MRM). While these approaches are often based on assumptions that simplify the complex nature of the hydrological system, they might not be able to address uncertainties associated with the complexity of the system. One of the powerful tools to deal with this issue is data-driven model that can be easily adopted in complex systems. The objective of this research is to utilize three different data-driven models: random forest (RF), adaptive neuro-fuzzy inference system (ANFIS), and M5 decision tree algorithm to predict peak flow associated with various return periods in ungauged catchments. Results from each data-driven model were assessed and compared with the conventional multivariate regression method. Results revealed all the three data-driven models performed better than the multivariate regression method. Among them, the RF model not only demonstrated the superior performance of peak flow prediction compared to the other algorithms but also provided insight into the complexity of the system through delivering a mathematical formulation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Floods are among the costliest natural hazards experienced in most of the places in the world, which results in heavy losses of life and economic damages (Gao et al. 2018). Regional flood frequency analysis (RFFA) is an especially used method for estimating flood risk at target locations in river basins where streamflow measurements are either limited or unavailable (Griffis and Stedinger 2007; Leclerc and Ouarda 2007; Zaman et al. 2012, Smith et al. 2015 and Lotfirad et al. 2018) The first RFFA was undertaken by the USGS in New England (Kinnison and Colby 1945). Lately, RFFA has received significant heed for design and management of water substructures such as dams, reservoirs, and bridges to slight flood risks and hence financial devastations. Much of this attention has occurred in response to water-related hazards such as the flood in regions where minute or no data is available on peak flows such as the Indus floods in East of Iran.
RFFA has been used as a crucial tool for many applications for example (i) water substructure, (ii) flood preservation projects, (iii) land cover planning, and other hydrologic studies. The regional techniques consist of a multivariate statistical structure derived from catchment characteristics data (Rao and Srinivas 2008). In this process, the region of influence approach can be formed where some catchments are pooled together based on vicinity in geographic. Consequently, an optimum region is made based on some objective function (Holmes et al. 2002; Aziz et al. 2010).
The relationship among flood flow values, physiographical and/or morphological characteristics of a catchment is a fundamental framework for RFFA. Various methods such as multivariate regression method (MRM), square grids method (SGM) and hybrid method (HM) have been developed in the literature for RFFA; each approach has its advantages and disadvantages. Among these methods, MRM has been widely used and presented in previous studies (Golestani et al. 2010; Malekinezhad et al. 2011; Latt et al. 2014).
Recently, data-driven models have been widely adopted in hydrological studies. Data-driven models can handle the nonlinearity and uncertainty in hydrological data very effectively. These methods are extensively used for rainfall-runoff modeling, water resource management modeling, and suspended sediment modeling take; for example, Kumar et al. (2015) provide the details of the application of data-driven models in regional flood frequency estimation which is explored. Regional flood frequency relationships are developed employing data-driven models viz. ANN and FIS for lower Godavari subzone 3(f) of India and the same have been compared with the regional relationships derived using the L-moments approach.
Recently, the data-driven models such as random forest (RF), fuzzy computing techniques and M5 decision tree, in complex modeling of the flood, have been developed by Solomatine and Xue (2004); Aziz et al. (2014); Sehgal et al (2014); Kumar et al (2015); Latt et al. (2015); Deo and Şahin (2016); Esmaeili-Gisavandani (2017); Ghumman et al. (2018); Zahiri and Nezaratian (2020); Desai and Ouarda (2021); Adib et al. (2022); Jahangir et al. (2022). Among the numerous data mining techniques, ANFIS is the most widely used approaches in various water-related areas. Being an accurate predictive tool, the ANFIS technique has, however, an inherent disadvantage that often results in hesitating to interpret their outputs. This is because of being a black box and consequently the nature of their solution is hazy. There might be a variation between networks of the same architecture trained on the same dataset due to the arbitrary nature of the internal representation. (Witten and Frank 2000). Srinivas et al. (2008); Aziz et al. (2017); Esmaelili-Gisavandani et al. (2017); Sharifi Garmdareh et al. (2018), and Zalnezhad et al. (2022) attempted to shed light on the structure of ANFIS using regional flood analysis and the methods of recovering rules. However, few studies have been used the application of the M5 algorithm and random forest in RFFA. Therefore, in this study, random forest and M5 algorithms were used to investigate peak flow and compare ANFIS with the multivariate regression method. The most advantage of M5 and RF have been classified by being induced into linear patches; these models provide a representation that is reproducible and understandable by practitioners. (Solomatine and Xue 2004; Jothiprakash and Kote 2011). The target of RFFA is predicting river peak flow associated with various return periods in ungauged catchments and also reduce uncertainty to evaluate the flooding (Merz and Blöschl 2008; Zaman et al. 2012; Shu and Ouarda 2012; Leščešen et al. 2019). The objectives of this study are two-fold. Firstly, the aim is to estimate the RFFA utilizing the random forest and M5 algorithms. Secondly, the goal is to estimate flood occurrences in data-deficient catchments within the western region of Iran.
Materials and methods
Out of eighty-nine stream gauges, thirty-two stations were used due to the availability of data. The data were obtained for the period of 1987-2018. The study area, including thirty-two stream gauges, is located in the west of Iran. From the homogeneous stations, twenty-seven stations were used for calibration and five stations for validation of the models (Fig. 1b). In fact, five stations were used for validation which was not used for the modeling and after making model, each data-driven model validated with these stations. To approach a unique model, the return period was taken into account as an independent factor. The study considered the annual maximum instantaneous peak flow. Kolmogorov–Smirnov test in EasyFit software 5.6 was used to estimate peak flow with different return periods based on the best distribution function (Shokouhifar et al. 2022).
Study area
The Karkheh River basin is located in the west of Iran (Fig. 1a). The Karkheh River basin covers 51,230 square kilometers in parts of six Iranian provinces. The Karkheh river length is approximately 900 km. (Fallah-Mehdipour et al. 2020). This study considers RFFA using three data-driven models (M5, RF and ANFIS) and a multivariate regression method in ungauged catchments. The further detail of Karkheh Basin could be found in many papers (see, e.g., Gheitasi 2016; Zamani et al. 2015).
Data
The following data were used in this study: (i) annual maximum instantaneous peak flow that was obtained from the Ministry of Energy of Iran and (ii) the physiographic characteristics of the catchment were extracted from the ALOS-PALSAR satellite with a spatial resolution of 12.5 m (https://asf.alaska.edu). The extraction of physiographic characteristics was carried out in the ArcGIS software 10.5.
The data utilized in this study, including the annual maximum flood (Q), ranges from 17.5 to 1337.8 m3/s. The flood discharge was calculated with a return period (T) of 2, 10, 100, 1000 years. Drainage areas (A) range from 8.17 to 26,187.02 km2. The range of the height of each sub-basin (H) is 1043 to 2621 m; the range of stream length (L) is between 4.77 to 420.82 km and catchment slope (S) varying from 8.14% to 37.67%. Table 1 presents the physiographic characteristics of the studied catchments.
Data-driven modeling
Data-driven modeling relies on relationships between measured data without a need for a priori knowledge of the physical system behavior (Jones et al. 2013; Ashrafzadeh et al. 2020; Biazar et al. 2020; Jafarpour et al. 2022). Once trained, data-driven modeling becomes a parametric description of the function. Out of several possible data-driven methods, ANFIS is the most widely used ones in water resource applications, whereas less attention has been directed toward the RF and M5 model trees. 70% of the data were used for training and 30% for testing in all models. A brief description of the methods mentioned above, is summarized as follows:
M5 model
M5 model is a data-driven model proposed by Quinlan (1992) and mainly employed in the realm of water science (Rahimikhoob 2014; Kisi and Kilic 2016; Kisi and Parmar 2016). Continuously, the final structure together with the dependent leaves is shown as a tree in Fig. 2b. The further detail of M5 model could be found in many papers (e.g., Farajpanah et al. 2020; Adib et al. 2023).
Random forest
The RF method is nonparametric and belongs to the family of ensemble methods. The RF method consists of a set of regression trees used to reconstruct educational data. Typically, a set of basic training examples is formed. Combining three parameters in RF is essential. The first is how many trees should be created, the second is how many variables are involved in creating a node for each network, and the third parameter is the size of the node, which indicates the depth of the regression tree created. One of the advantages of this method is that there is no need to prune the trees during modeling and classification (Esmaeili-Gisavandani et al. 2022).
ANFIS
Adaptive neuro-fuzzy inference system (ANFIS) could be a multilayer feed-forward network where each node performs a selected function on incoming signals (Jang 1993; Heddam 2014). An ordinary architecture of an ANFIS, during which a circle indicates a set node, whereas a square indicates an adaptive node, is shown in Fig. 3. The further detail of ANFIS could be found in many books (e.g., Azar 2010; Esmaeili-Gisavandani 2017; Adib et al. (2021).
Evaluation criteria
Normal root-mean-square error (NRMSE) and correlation coefficient (R2) were used to evaluate model performance:
where Xi and Yi are the observed and estimated values and \(\overline{{X_{i} }}\) are the average values of observation, and n represents the number of data. A comparison of the correlation coefficient and RMSE values recognizes a better performance. The best model has higher value of R2 and a smaller value of RMSE.
Results
According to Table 2, four combinations of input data were used in the MRM, ANFIS, M5 and RF models to peak flow with different return periods for regional flood frequency analysis (RFFA) (Table 3).
ANFIS results
To calculate RFFA with the ANFIS model, for any combination, an optimum number of membership functions was specified based on trial and error. The best type of membership function was recognized from between bell-shaped (gbellmf), trapezoidal-shaped (tramf), triangular-shaped (trimf), Gaussian (gaussmf) and Gaussian 2 (gauss2mf) by repeated model training and testing based on every membership function number and type via trial and error. Based on the correlation coefficient (R2) and root-mean-square error (RMSE), combinations 4 (R2 = 0.92 and NRMSE = 0.851) is better performance than the others (Fig. 4).
M5 results
The M5 model tree does not require to set any user-defined parameters. In addition, the M5 model can provide the number of linear relations which can be easily used to predict the RFFA, as shown in Fig. 5.
As shown in Table 4, the M5 model results indicated that input combination 4 gave a better performance than the other combinations (R2 = 0.95 and NRMSE = 0.45). The tree relationships of the M5 model for the best combination of the inputs are presented in the appendix.
RF results
As shown in Table 5, the RF model results indicated that input combination 4 gave a better performance than the other combinations (R2 = 0.96 and NRMSE = 0.223).
As shown in Fig. 5, the peak flow values estimated by each model are compared. RF has the best performance in peak flow estimation, while MRM has the worst. Furthermore, most of the models underestimated the peak flow in the 2-year and 10-year return periods, while most overestimated the peak flow in the 100-year and 1000-year return periods.
As Fig. 6 illustrates performance of the used models in calibration (twenty-seven stream gauges) and validation (5 stream gauges) stages, according to the Taylor diagrams (Fig. 6), the performance of the RF model is the best. In the return periods of 2, 10, and 1000 years, the M5 model ranks second after the RF, but in the 100-year return period, the ANFIS model ranks second after RF.
Discussion
This study aims to provide a relatively simple method to estimate peak flow amounts in ungauged region based on their physiographic characteristics. To achieve this, data-driven models of varying natures were used. The MRM model is based on regression, the ANFIS model is based on fuzzy logic, the M5 model is based on classification, and the RF model is based on ensemble learning under supervision. Models require inputs such as area, stream length, basin slope, basin height, and return period number, which can all be derived from topography. Also, the best combination of inputs belonged to combination 4 with a higher correlation coefficient and lower NRMSE. Based on the excellent results obtained in estimating peak flow in the calibration and verification stages, particularly using the RF model, it is clear that this study is far more effective than similar studies whose inputs and modeling process were incredibly complex. It is evident from Fig. 6 that the models used for estimating peak flow had a favorable performance, especially the RF model, with better accuracy in the short-term return period of 2 and 10 years than in the long-term return period (100 and 1000 years).
RFFA makes a relationship between flood frequency and physiographical characteristics of catchments to estimate flood in ungagged regions like Rahman et al. (2020). In this regard, the performances of the RF and M5 tree network as piecewise linear functions, ANFIS and multivariate regression method were evaluated to estimate flood frequency in the ungagged sub-catchments like Vafakhah and Bozchaloei (2020).
A comparison of the correlation coefficient and root-mean-squared error values indicated an improved performance obtained from the data-driven model compared to traditional methods such as the multivariate regression (MRM) Method. However, the performance of the RF model is almost similar to the M5 and ANFIS models.
Conclusions
Knowing the magnitude of historical floods in a particular area is crucial for designing hydraulic structures. Small and medium watersheds often lack ground flow measurement stations due to the costs involved in building and maintaining them. In contrast, hydraulic structures need to be built on rivers in these areas in order to develop civil and agricultural activities. Therefore, the flood discharge design must be determined. This study used machine learning models to estimate the peak flow of ungauged watersheds.
The following model performance was found in this study:
The procreated dendriform structure of multi-linear models utilized in RF and M5 is comprehensible and straightforward to grasp for decision-makers. It also provides an honest overview of the relationships between the physiographic characteristics of the watershed;
The RF and M5 model permits to simply create a family of explainable models with a varied number of component models and thus varied strength and correctness;
Modeling with the RF and M5 are the fastest data-driven models (proceeding of data with RF and M5 is faster than ANFIS);
The information encapsulated in RF and M5 algorithms can potentially assist in variable selection and the evaluation of their relationships when processing data with other models. For instance, M5 can aid users in determining the sensitivity of the data.
References
Adib A, Zaerpour A, Kisi O, Lotfirad M (2021) A rigorous wavelet-Packet transform to retrieve snow depth from SSMIS data and evaluation of its reliability by uncertainty parameters. Water Resour Manage 35:2723–2740. https://doi.org/10.1007/s11269-021-02863-x
Adib A, Farajpanah H, Shoushtari MM, Lotfirad M, Saeedpanah I, Sasani H (2022) Selection of the best machine learning method for estimation of concentration of different water quality parameters. Sustain Water Resour Manag 8(6):172. https://doi.org/10.1007/s40899-022-00765-3
Adib A, Kalantarzadeh SSO, Shoushtari MM, Lotfirad M, Liaghat A, Oulapour M (2023) Sensitive analysis of meteorological data and selecting appropriate machine learning model for estimation of reference evapotranspiration. Appl Water Sci 13(3):83. https://doi.org/10.1007/s13201-023-01895-5
Ashrafzadeh A, Kişi O, Aghelpour P, Biazar SM, Masouleh MA (2020) Comparative study of time series models, support vector machines, and GMDH in forecasting long-term evapotranspiration rates in northern Iran. J Irrig Drain Eng 146(6):04020010. https://doi.org/10.1061/(ASCE)IR.1943-4774.0001471
Aziz K, Rahman A, Fang G, Shrestha S (2014) Application of artificial neural networks in regional flood frequency analysis: a case study for Australia. Stoch Env Res Risk Assess 28(3):541–554. https://doi.org/10.1007/s00477-013-0771-5
Aziz K, Haque MM, Rahman A, Shamseldin AY, Shoaib M (2017) Flood estimation in ungauged catchments: application of artificial intelligence-based methods for Eastern Australia. Stoch Env Res Risk Assess 31(6):1499–1514. https://doi.org/10.1007/s00477-016-1272-0
Aziz K, Rahman A, Fang G, Haddad K, & Shrestha S (2010) Design flood estimation for ungauged catchments: application of artificial neural networks for eastern Australia. In: World Environmental and Water Resources Congress 2010: Challenges of Change (pp 2841–2850). doi:https://doi.org/10.1061/41114(371)293
Biazar SM, Rahmani V, Isazadeh M, Kisi O, Dinpashoh Y (2020) New input selection procedure for machine learning methods in estimating daily global solar radiation. Arab J Geosci 13:1–17. https://doi.org/10.1007/s12517-020-05437-0
Deo RC, Şahin M (2016) An extreme learning machine model for the simulation of monthly mean streamflow water level in eastern Queensland. Environ Monit Assess 188(2):90. https://doi.org/10.1007/s10661-016-5094-9
Desai S, Ouarda TB (2021) Regional hydrological frequency analysis at ungauged sites with random forest regression. J Hydrol 594:125861. https://doi.org/10.1016/j.jhydrol.2020.125861
Esmaeili Gisavandani H (2017) Evaluation of the ability of adaptive neuro-fuzzy interface system, artificial neural network and regression to regional flood analysis. J Water Soil Conserv 24(3):149–166. https://doi.org/10.22069/JWFST.2017.11413.2581
Esmaeili-Gisavandani H, Farajpanah H, Adib A, Kisi O, Riyahi MM, Lotfirad M, Salehpoor J (2022) Evaluating ability of three types of discrete wavelet transforms for improving performance of different ML models in estimation of daily-suspended sediment load. Arab J Geosci 15(1):1–13. https://doi.org/10.1007/s12517-021-09282-7
Fallah-Mehdipour E, Bozorg-Haddad O, Loáiciga HA (2020) Climate-environment-water: integrated and non-integrated approaches to reservoir operation. Environ Monit Assess 192(1):60. https://doi.org/10.1007/s10661-019-8039-2
Farajpanah H, Lotfirad M, Adib A, Esmaeili-Gisavandani H, Kisi Ö, Riyahi MM, Salehpoor J (2020) Ranking of hybrid wavelet-AI models by TOPSIS method for estimation of daily flow discharge. Water Supply 20(8):3156–3171. https://doi.org/10.2166/ws.2020.211
Gao W, Shen Q, Zhou Y, Li X (2018) Analysis of flood inundation in ungauged basins based on multi-source remote sensing data. Environ Monit Assess 190(3):129. https://doi.org/10.1007/s10661-018-6499-4
Gheitasi M (2016) Flood frequency analysis of the maximum annual discharge of rivers in Lorestan province (case study: Karkheh watershed in Lorestan province) (Doctoral dissertation, University of Zabol)
Ghumman AR, Ahmad S, Hashmi HN (2018) Performance assessment of artificial neural networks and support vector regression models for stream flow predictions. Environ Monit Assess 190(12):704. https://doi.org/10.1007/s10661-018-7012-9
Golestani M, Kavianpour MR, & Hedayatizade M (2010, November) Determination of homogeneous regions case study: South-East Urmia Lake Catchment, Iran. In: 2010 2nd International Conference on Chemical, Biological and Environmental Engineering (pp 71–74). IEEE, doi: https://doi.org/10.1109/ICBEE.2010.5648935
Griffis VW, Stedinger JR (2007) The use of GLS regression in regional hydrologic analysis. J Hydrol 344(1):82–95. https://doi.org/10.1016/j.jhydrol.2007.06.023
Heddam S (2014) Modeling hourly dissolved oxygen concentration (DO) using two different adaptive neuro-fuzzy inference systems (ANFIS): a comparative study. Environ Monit Assess 186(1):597–619. https://doi.org/10.1007/s10661-013-3402-1
Holmes MGR, Young AR, Gustard A, Grew R (2002) A region of influence approach to predicting flow duration curves within ungauged catchments. Hydrol Earth Syst Sci 6:721–731
Jafarpour M, Adib A, Lotfirad M (2022) Improving the accuracy of satellite and reanalysis precipitation data by their ensemble usage. Appl Water Sci 12(9):232. https://doi.org/10.1007/s13201-022-01750-z
Jahangir MS, Biazar SM, Hah D, Quilty J, Isazadeh M (2022) Investigating the impact of input variable selection on daily solar radiation prediction accuracy using data-driven models: a case study in northern Iran. Stoch Env Res Risk Assess 36(1):225–249. https://doi.org/10.1007/s00477-021-02070-5
Jang JSR (1993) ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybern 23(3):665–685. https://doi.org/10.1109/21.256541
Jones RM, Liu L, Dorevitch S (2013) Hydrometeorological variables predict fecal indicator bacteria densities in freshwater: data-driven methods for variable selection. Environ Monit Assess 185(3):2355–2366. https://doi.org/10.1007/s10661-012-2716-8
Jothiprakash V, Kote AS (2011) Effect of pruning and smoothing while using M5 model tree technique for reservoir inflow prediction. J Hydrol Eng 16(7):563–574
Kinnison HB, Colby BR (1945) Flood formulas based on drainage basin characteristics. Trans Am Soc Civ Eng 110(1):849–876
Kisi O, Kilic Y (2016) An investigation on generalization ability of artificial neural networks and M5 model tree in modeling reference evapotranspiration. Theoret Appl Climatol 126(3–4):413–425. https://doi.org/10.1007/s00704-015-1582-z
Kisi O, Parmar KS (2016) Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution. J Hydrol 534:104–112. https://doi.org/10.1016/j.jhydrol.2015.12.014
Kumar R, Goel NK, Chatterjee C, Nayak PC (2015) Regional flood frequency analysis using soft computing techniques. Water Resour Manage 29(6):1965–1978. https://doi.org/10.1007/s11269-015-0922-1
Latt ZZ (2015) Application of feedforward artificial neural network in Muskingum flood routing: a black-box forecasting approach for a natural river system. Water Resour Manage 29(14):4995–5014. https://doi.org/10.1007/s11269-015-1100-1
Latt ZZ, Wittenberg H (2014) Improving flood forecasting in a developing country: a comparative study of stepwise multiple linear regression and artificial neural network. Water Resour Manage 28(8):2109–2128. https://doi.org/10.1007/s11269-014-0600-8
Leclerc M, Ouarda TBMJ (2007) Non stationary regional frequency analysis at ungaged sites. J Hydrol 343(3):254–265. https://doi.org/10.1016/j.jhydrol.2007.06.021
Leščešen I, Urošev M, Dolinaj D, Pantelić M, Telbisz T, Varga G, Milošević D (2019) Regional flood frequency analysis based on L-moment approach case study Tisza river basin. Water Resour 46(6):853–860. https://doi.org/10.1134/S009780781906006X
Lotfirad M, Adib A, Haghighi A (2018) Estimation of daily runoff using of the semi-conceptual rainfall-runoff IHACRES model in the Navrood watershed (a watershed in the Gilan province. Iran J Ecohydrol 5(2):449–460
Malekinezhad H, Nachtnebel HP, Klik A (2011) Comparing the index-flood and multiple-regression methods using L-moments. Phys Chem Earth, Parts a/b/c 36(1–4):54–60. https://doi.org/10.1016/j.pce.2010.07.013
Merz R, Blöschl G (2008) Flood frequency hydrology: 2. Combining data evidence. Water Resour Res. https://doi.org/10.1029/2007wr006744
Quinlan JR (1992, November) Learning with continuous classes. In: 5th Australian joint conference on artificial intelligence (Vol 92, pp 343–348)
Rahimikhoob A (2014) Comparison between M5 model tree and neural networks for estimating reference evapotranspiration in an arid environment. Water Resour Manage 28(3):657–669. https://doi.org/10.1007/s11269-013-0506-x
Rahman AS, Khan Z, Rahman A (2020) Application of independent component analysis in regional flood frequency analysis: comparison between quantile regression and parameter regression techniques. J Hydrol 581:124372. https://doi.org/10.1016/j.jhydrol.2019.124372
Rao AR, Srinivas VV (2008) Regionalization of watersheds: an approach based on cluster analysis. Springer Science and Business Media, Cham
Sehgal V, Sahay RR, Chatterjee C (2014) Effect of utilization of discrete wavelet components on flood forecasting performance of wavelet based ANFIS models. Water Resour Manage 28(6):1733–1749. https://doi.org/10.1007/s11269-014-0584-4
Sharifi Garmdareh E, Vafakhah M, Eslamian SS (2018) Regional flood frequency analysis using support vector regression in arid and semi-arid regions of Iran. Hydrol Sci J 63(3):426–440. https://doi.org/10.1080/02626667.2018.1432056
Shokouhifar Y, Lotfirad M, Esmaeili-Gisavandani H, Adib A (2022) Evaluation of climate change effects on flood frequency in arid and semi-arid basins. Water Supply 22(8):6740–6755. https://doi.org/10.2166/ws.2022.271
Shu C, Ouarda TB (2012) Improved methods for daily streamflow estimates at ungauged sites. Water Resour Res. https://doi.org/10.1029/2011WR011501
Smith A, Sampson C, Bates P (2015) Regional flood frequency analysis at the global scale. Water Resour Res 51(1):539–553. https://doi.org/10.1002/2014WR015814
Solomatine DP, Xue Y (2004) M5 model trees and neural networks: application to flood forecasting in the upper reach of the Huai River in China. J Hydrol Eng 9(6):491–501. https://doi.org/10.1061/(ASCE)1084-0699(2004)9:6(491)
Srinivas VV, Tripathi S, Rao AR, Govindaraju RS (2008) Regional flood frequency analysis by combining self-organizing feature map and fuzzy clustering. J Hydrol 348(1–2):148–166. https://doi.org/10.1016/j.jhydrol.2007.09.046
Vafakhah M, Bozchaloei SK (2020) Regional analysis of flow duration curves through support vector regression. Water Resour Manage 34(1):283–294. https://doi.org/10.1007/s11269-019-02445-y
Wang L, Kisi O, Zounemat-Kermani M, Zhu Z, Gong W, Niu Z, Liu Z (2017) Prediction of solar radiation in China using different adaptive neuro-fuzzy methods and M5 model tree. Int J Climatol 37(3):1141–1155. https://doi.org/10.1002/joc.4762
Zahiri J, Nezaratian H (2020) Estimation of transverse mixing coefficient in streams using M5, MARS, GA, and PSO approaches. Environ Sci Pollut Res. https://doi.org/10.1007/s11356-020-07802-8
Zalnezhad A, Rahman A, Vafakhah M, Samali B, Ahamed F (2022) Regional flood frequency analysis using the FCM-ANFIS algorithm: a case study in South-Eastern Australia. Water 14(10):1608. https://doi.org/10.3390/w14101608
Zaman MA, Rahman A, Haddad K (2012) Regional flood frequency analysis in arid regions: a case study for Australia. J Hydrol 475:74–83. https://doi.org/10.1016/j.jhydrol.2012.08.054
Zamani R, Tabari H, Willems P (2015) Extreme streamflow drought in the Karkheh river basin (Iran): probabilistic and regional analyses. Nat Hazards 76(1):327–346. https://doi.org/10.1007/s11069-014-1492-x
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Contributions
The authors declare that they have contribution in the preparation of this manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Ethical approval
The manuscript is an original work with its own merit, has not been previously published in whole or in part, and is not being considered for publication elsewhere.
Additional information
Publisher's Note
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Regression expressions of the best M5 model
Here, the linear regressions of the best results of M5 modeling.
If H < = 1767.802 and S < = 16.041 and T < = 17.5 then LM number = 1.
LM number = 1.
Q = 0.3532 × T + 0.0288 × A—0.4854 × H + 1.0686 × L + 27.7689 × S + 323.3353.
If H < = 1767.802 and S < = 16.041 and T > 17.5 then LM number = 2.
LM number = 2.
Q = 0.4087 × T + 0.0992 × A—0.4854 × H + 1.0686 × L + 27.7689 × S + 287.7801.
If H < = 1767.802 and S > 16.041 and H < = 1639.885 and A < = 72.041 and T < = 17.5 and A < = 35.66 then LM number = 3.
LM number = 3.
Q = 9.915 × T—0.1856 × A—3.4894 × H + 3.4408 × L + 19.681 × S + 5136.1149.
If H < = 1767.802 and S > 16.041 and H < = 1639.885 and A < = 72.041 and T < = 17.5 and A > 35.66 then LM number = 4.
LM number = 4.
Q = 8.2118 × T—1.1938 × A—3.4894 × H + 3.4408 × L + 19.681 × S + 5144.5592.
If H < = 1767.802 and S > 16.041 and H < = 1639.885 and A < = 72.041 and T > 17.5 then LM number = 5.
LM number = 5.
Q = 1.6026 × T + 0.045 × A—4.3676 × H + 3.4408 × L + 19.681 × S + 6531.1303.
If H < = 1767.802 and S > 16.041 and H < = 1639.885 and A > 72.041 and T < = 17.5 then LM number = 6.
LM number = 6.
Q = 21.6586 × T + 0.0392 × A—3.0971 × H + 3.4408 × L + 19.681 × S + 5203.2448.
If H < = 1767.802 and S > 16.041 and H < = 1639.885 and A > 72.041 and T > 17.5 then LM number = 7.
LM number = 7.
Q = 2.7201 × T + 0.0339 × A—3.0971 × H + 3.4408 × L + 19.681 × S + 5622.2802.
If H < = 1767.802 and S > 16.041 and H > 1639.885 and L < = 67.812 then LM number = 8.
LM number = 8.
Q = 0.8159 × T—0.0317 × A—1.332 × H + 4.5635 × L + 19.681 × S + 1703.4591.
If H < = 1767.802 and S > 16.041 and H > 1639.885 and L > 67.812 and T < = 17.5 and A < = 6591.094 then LM number = 9.
LM number = 9.
Q = 10.8508 × T—0.0197 × A—1.332 × H + 4.0254 × L + 19.681 × S + 1833.1274.
If H < = 1767.802 and S > 16.041 and H > 1639.885 and L > 67.812 and T < = 17.5 and A > 6591.094 then LM number = 10.
LM number = 10.
Q = 12.7516 × T—0.0197 × A—1.332 × H + 4.0254 × L + 19.681 × S + 1851.5772.
If H < = 1767.802 and S > 16.041 and H > 1639.885 and L > 67.812 and T > 17.5 then LM number = 11.
LM number = 11.
Q = 1.1691 × T—0.0107 × A—1.332 × H + 4.0254 × L + 19.681 × S + 1967.2823.
If H > 1767.802 and A < = 157.306 then LM number = 12.
LM number = 12.
Q = 0.1579 × T + 0.0084 × A—0.3096 × H + 0.4856 × L + 15.8241 × S + 331.4818.
If H > 1767.802 and A > 157.306 and S < = 20.719 and T < = 17.5 then LM number = 13.
LM number = 13.
Q = 0.2033 × T + 0.0109 × A—0.2609 × H + 0.4856 × L + 16.9375 × S + 225.23134818.
If H > 1767.802 and A > 157.306 and S < = 20.719 and T > 17.5 and H < = 1959.692 and A < = 1418.605 then LM number = 14.
LM number = 14.
Q = 0.2281 × T + 0.031 × A—0.2609 × H + 0.4856 × L + 16.9375 × S + 212.4058.
If H > 1767.802 and A > 157.306 and S < = 20.719 and T > 17.5 and H < = 1959.692 and A > 1418.605 then LM number = 15.*
LM number = 15.
Q = 0.2382 × T + 0.031 × A—0.2609 × H + 0.4856 × L + 16.9375 × S + 218.4885.
If H > 1767.802 and A > 157.306 and S < = 20.719 and T > 17.5 and H > 1959.692 then LM number = 16.
LM number = 16.
Q = 0.2111 × T + 0.0109 × A—0.2609 × H + 0.4856 × L + 16.9375 × S + 254.9837.
If H > 1767.802 and A > 157.306 and S > 20.719 and T < = 17.5 then LM number = 17.
LM number = 17.
Q = 0.3026 × T + 0.2288 × A—0.2609 × H + 0.4856 × L + 19.3761 × S + 20.518.
If H > 1767.802 and A > 157.306 and S > 20.719 and T > 17.5 then LM number = 18.
LM number = 18.
Q = 0.3336 × T + 0.3017 × A—0.2609 × H + 0.4856 × L + 19.3761 × S—22.9879.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Esmaeili-Gisavandani, H., Zarei, H. & Fadaei Tehrani, M.R. Regional flood frequency analysis using data-driven models (M5, random forest, and ANFIS) and a multivariate regression method in ungauged catchments. Appl Water Sci 13, 139 (2023). https://doi.org/10.1007/s13201-023-01940-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13201-023-01940-3