Development of a convolutional neural network based regional flood frequency analysis model for South-east Australia

Afrin, Nilufa; Ahamed, Farhad; Rahman, Ataur

doi:10.1007/s11069-024-06669-z

Development of a convolutional neural network based regional flood frequency analysis model for South-east Australia

Original Paper
Open access
Published: 14 May 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Natural Hazards Aims and scope Submit manuscript

Development of a convolutional neural network based regional flood frequency analysis model for South-east Australia

Download PDF

Abstract

Flood is one of the worst natural disasters, which causes significant damage to economy and society. Flood risk assessment helps to reduce flood damage by managing flood risk in flood affected areas. For ungauged catchments, regional flood frequency analysis (RFFA) is generally used for design flood estimation. This study develops a Convolutional Neural Network (CNN) based RFFA technique using data from 201 catchments in south-east Australia. The CNN based RFFA technique is compared with multiple linear regression (MLR), support vector machine (SVM), and decision tree (DT) based RFFA models. Based on a split-sample validation using several statistical indices such as relative error, bias and root mean squared error, it is found that the CNN model performs best for annual exceedance probabilities (AEPs) in the range of 1 in 5 to 1 in 100, with median relative error values in the range of 29–44%. The DT model shows the best performance for 1 in 2 AEP, with a median relative error of 24%. The CNN model outperforms the currently recommended RFFA technique in Australian Rainfall and Runoff (ARR) guideline. The findings of this study will assist to upgrade RFFA techniques in ARR guideline in near future.

Deep learning algorithms to develop Flood susceptibility map in Data-Scarce and Ungauged River Basin in India

Article 20 March 2022

Integration of convolutional neural networks for flood risk mapping in Tuscany, Italy

Article 10 August 2022

Flood Replication Using ANN Model Concerning with Various Catchment Characteristics: Narmada River Basin

Article 28 March 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Flood is a natural hazard that can cause significant damage to agriculture and infrastructure (Longman et al. 2019). Design flood is a flood discharge, which is associated with an annual exceedance probability (AEP), and widely used in flood risk assessment. At-site flood frequency analysis (FFA) is generally used to estimate design floods when recorded flood data of sufficient length and quality is available at the site of interest (Kuczera and Franks 2019). However, there are numerous ungauged catchments where FFA is not directly applicable. Regional flood frequency analysis (RFFA) is used for these ungauged catchments, which attempts to transfer flood characteristics from gauged to ungauged catchments on the basis of regional homogeneity (Şen 1980; Cunnane 1988; Shu and Ouarda 2008). RFFA techniques have been evolved over the years from simple rational method to more complex data-driven models with the increase in computing power (Potter 1987; Kirby and Moss 1987; NRC 1988; Bobee et al. 1993; Jingyi and Hall 2004; Dawson et al. 2006; Archfield et al. 2013; Chebana et al. 2014; Msilini et al. 2020; Zalnezhad et al. 2022a, b; Esmaeili-Gisavandani et al. 2023).

Numerous linear RFFA techniques have been proposed over the years, such as the probabilistic rational method (Pilgrim and Cordery 1993; Rahman et al. 2011; Gilmore et al. 2014), index flood method (Hosking and Wallis 1993; Bates et al. 1998; Rahman et al. 1998; Smith et al. 2015; Zalnezhad et al. 2023) and ordinary least square and generalized least square based quantile regression techniques (QRT) (Stedinger and Tasker 1985; Rahman 2005; Ouarda et al. 2008; Haddad and Rahman 2012; Zalnezhad et al. 2022a, b). According to Sivakumar and Singh (2012), hydrologic process is often non-linear because many of the processes involved in the movement and distribution of runoff are non-linear. With the advent of computer technology, non-linear techniques such as artificial intelligence (AI) based models are increasingly being adopted to develop AI-based RFFA models (Dawson et al. 2006; Aziz et al. 2013, 2014, 2017; Zorn and Shamseldin 2015; Ghaderi et al. 2019; Vafakhah and Khosrobeigi Bozchaloei, 2020; Filipova et al. 2022; Zalnezhad et al. 2022a). In most of these studies, AI-based models have outperformed the linear RFFA ones.

Recently, AI based technique such as deep learning (DL) methods have received attention as they have a higher capability of identifying patterns and features in a large dataset with a greater accuracy. For example, Jiang et al. (2022) applied DL method to predict relative humidity and compared DL method with support vector regression (SVR), decision tree (DT) regression, and deep residual (DR) regression and found that DL outperformed the other methods. CNN is a type of DL method, which has demonstrated state-of-the-art performance in many computer vision tasks and have become a standard tool for image processing in many fields such as medical imaging, autonomous driving, and surveillance (Aurna et al. 2022; Yuan et al. 2023; Lee and Liu, 2023; Patel and Elgazzar 2023). CNN has also been used for text classification, sentiment analysis, and question-answering (Zhou 2022; Habbat et al. 2022; Manmadhan and Kovoor, 2023). CNN has also been successfully applied to forecast floods from satellite images (Chen et al. 2021) and predict depth of urban flooding (Chen et al. 2023). In addition, CNN has shown good performance for flood susceptibility (Wang et al. 2020), flood forecasting (Kimura et al. 2019), and fluvial flood prediction (Kabir et al. 2020). However, application of CNN in RFFA is limited.

DT is a non-parametric supervised learning algorithm, which has been widely adopted in different fields such as aquarium control systems, tourist behaviour, and the optimal location of solar power plants (González-Sánchez et al. 2022; Abdurohman et al. 2022; Shorabeh et al. 2022). It has shown good performance in hydrology such as flood susceptibility assessment (Khosravi et al. 2018; Chen et al. 2020; Ghosh et al. 2022). According to Tehrany et al. (2013), its robust predictive capabilities make it well-suited for generating susceptibility maps, capitalizing on its accurate prediction capacity; however, its utilization in RFFA remains relatively limited.

Support Vector Machine (SVM) is another popular algorithm based on statistical learning theory which was proposed by Vapnik and Chervonenkis (1974) and Vapnik (1995). Since the successful inception of SVM in other fields of hydrology (Wu et al. 2008; Pijush 2011), researchers were stimulated to apply it in RFFA. For example, Ghaderi et al. (2019) compared three data driven RFFA models, adaptive neuro-fuzzy inference system (ANFIS), SVM and genetic expression programming (GEP) where SVM outperformed the other methods. A similar result was also found by Sharifi Garmdareha et al. (2018). Haddad and Rahman (2020) applied multidimensional scaling (MDS) in RFFA, which is capable of developing a visual representation of similar catchments in either catchment characteristics or geographical data space. They found that the MDS-based SVR model with radial basis function (RBF) performed more consistently in RFFA. Vafakhah and Khosrobeigi Bozchaloei (2020) compared SVR, artificial neural network (ANN), and non-linear regression (NLR) in RFFA and found that SVR outperformed the other methods. Similar results were observed by Allahbakhshian-Farsani et al. (2020) who compared several AI-based RFFA models, multivariate regression spline (MARS), boosted regression trees (BRT) and projection pursuit regression (PPR) with NLR and found that SVR model with RBF functions performed better than others.

Today, RFFA remains an active area of research, with ongoing efforts to refine and improve the methodology to apply more accurate flood risk assessment in ungauged catchments. While AI-based methods have shown superior performance compared to traditional approaches and other fields within hydrology are exploring novel AI-based techniques, there has been limited investigation into the application of these new AI-based methods in RFFA. To fill this knowledge gap and build upon the successful application of CNN in other domains, this study introduces a CNN-based RFFA methodology. The CNN-based approach is compared with well-established techniques such as DT and SVM. Additionally, given the importance of interpretability in practical applications (Warner and Misra 1996), multiple linear regression (MLR) is also included in the study. It is expected that the outcomes of this study will assist in recommending AI-based RFFA models for practical applications in Australia and other countries.

2 Study area and data

This study selects south-east Australia since this part of Australia has the best quality streamflow data. South-east Australia consists of the Victoria and New South Wales states. Victoria is dominated by winter rainfall. The Great Dividing Range (GDR) divides coastal part from the inland regions of south-east Australia. The GDR stars from the Queensland state and ends at the eastern edge of the Victoria state, measures approximately 3500 km long. This study considered both side of the GDR (inland and coastal) as a single region based on previous studies by Ali and Rahman (2022) and Zalnezhad et al. (2022a, b).

For this study, 201 gauged catchments from southeast Australia are selected with annual maximum flood (AMF) data series length ranging from 25 to 89 years. Figure 1 shows the locations of the selected catchments. The selected catchments are not affected by major land use change, which provides an opportunity to study the natural hydrological processes of these catchments. To calculate at-site flood quantiles log-Pearson Type 3 (LP3) distribution with Bayesian parameter estimation technique was adopted using FLIKE software (Kuczera and Franks 2019). Six flood quantiles are used, which are AEPs of 1 in 2 (Q₂), 1 in 5 (Q₅), 1 in 10 (Q₁₀), 1 in 20 (Q₂₀), 1 in 50 (Q₅₀) and 1 in 100 (Q₁₀₀). It should be noted that other flood frequency distributions could have been adopted but LP3 distribution generally performs better with Australian AMF data (Rahman et al. 2013) and hence it is adopted here.

Previous studies demonstrated that acceptable homogeneous regions cannot be established in Australia. For example, Ahmed et al. (2024) reported heterogeneity (H1) statistics in the range of 5.11–26.27 for south-east Australia (H1 values of 1.00 or smaller are needed for an acceptable region).

In this study, eight catchment characteristics are selected (Table 1) since these were found to be important in previous Australian RFFA studies (Haddad et al., 2012; Rahman et al. 2020; Zalnezhad et al. 2022a, b, 2022a). These catchment characteristics are catchment area (AREA), rainfall intensity with 6 h duration and 1 in 2 AEP (I₆₂), mean annual rainfall (MAR), shape factor (SF), mean annual evapotranspiration (MAE), stream density (SDEN), slope of central 75% of mainstream (S1085) and forest (FOREST). A summary of the descriptive statistics of the selected catchment characteristics for the 201 study catchments is presented in Table 1. The boxplots of the selected catchment characteristics are presented in Fig. 2.

Table 1 Descriptive statistics of the selected catchment characteristics

Full size table

In Fig. 2, Y axis presents represents measurement unit. It should be noted that in Fig. 2, box width of AREA is 128 km² to 487km². In Table 1, it can be seen that smallest catchment is 3 km² and largest one is 1010 km² with median value of 261 km².. AREA is generally considered as the main scaling factor in RFFA as it directly influences flood volume from a given storm event and is directly related to the mean annual flood (Rahman 1997).

I₆₂ is another useful climatic characteristic in RFFA. According to the rational method, rainfall intensity with a duration equal to the time of concentration (t_c) is very logical. For the selected catchments, the mean t_c is 6.45 h. Since use of rainfall intensity of a fixed duration is preferable in RFFA studies, selection of six hours duration is logical as it is close to mean t_c. In Fig. 2, box width of I₆₂ is comparatively narrower varying from 32.15 to 43.1 mm/h; however, there are few high values plotted as outliers in the boxplot.

MAR does not directly impact the flood generation process, but it is an indication of a catchment wetness. MAR is included in this study as a candidate predictor variable. It has been seen in Fig. 2, box width of MAR range from 725.67 to 1125.7 mm with few outliers. MAE is selected in this study as it indicates catchment dryness. MAR, MAE and I₆₂ data are obtained from Australian Bureau of Meteorology website. In Fig. 2, MAE shows narrow range of box width from 1024.5 to 1166.1 mm with few outliers.

Catchment shape has a direct impact on flood generation. The shortest distance between the catchment centroid and outlet is divided by the square root of the catchment area to get the shape factor (SF) (Rahman et al. 2015). The higher the SF the smaller the flood peak In Table 1 it can be seen that minimum SF value is 0.258 and maximum is 1.63 with median value of 0.78 and box width of SF in Fig. 2 ranges 0.6227 to 0.9246 with few outliers. Stream density (SDEN) affects flood generation process (a higher SDEN enhances drainage efficiency of a catchment). In Fig. 2, it can be seen that the SDEN box width ranges 1.38–2.67 km⁻¹ with a median value of 1.69 km⁻¹.

Slope is one of the key catchment characteristics affecting flood generation process. A higher slope reduces travel time of runoff through increasing velocity. Benson (1959) mentioned that S1085 gave the best prediction of the mean annual flood. Hence S1085 is adopted in this study. To express S1085, if L is the mainstream length of the catchment, E1 is the elevation at the 0.1L position and E2 is the elevation at the 0.85L position along the mainstream from the catchment outlet, and E is the difference of elevation E1 and E2, then S1085 is the ratio of E and L. In Table 1, it can be seen that S1085 has a range of 0.8–69.9 m/km, which represents a wider variation (indicating some of the catchments are really flat and some are highly steep). In Fig. 2 for S1085, box width varies from 5.48 to 16.48 km⁻¹, however, there are few higher values.

The flood generation process can be delayed due to an increase in forest area (FOREST) as it helps infiltration and reduces flow velocity. Here, FOREST indicates fraction of catchment areawhich is forested. Box width of FOREST in Fig. 2 can be seen varying from 0.22 to 0.89 and in Table 1, minimum value is 0.0001 and maximum value is 1 with median value of 0.59. It can be observed from Fig. 2 and Table 1 that among selected catchments, some of them are highly forested, while some have little forest area SF, SDEN, S1085 and FOREST data are obtained from 1:100,000 topographic maps of the selected catchments. Based on previous study (Rahman et al. 2009; Rahman and Rahman 2020) 1:100,000 scale for topographic map is selected to obtain these values.

3 Methodology

Figure 3 presents adopted overall methodology in this study. At the beginning, from the literature review, background knowledge has been gained about AI-based RFFA techniques. The next step is selection of study area and collation of streamflow and catchment characteristics data set. Thereafter, CNN, DT, SVM and MLR models have been developed for six AEP’s (Q₂, Q₅, Q₁₀, Q₂₀, Q₅₀ and Q₁₀₀) and the models have been tested by implementing split-sample validation. In split-sample validation technique, the dataset have been parted randomly for training and testing. Then the results of all the models are compared by performing nine statistical measures which are presented in Sect. 3.6.

Adam (adaptive moment estimation) optimizer was used for CNN models to train the data sets. It is a stochastic gradient descent method (Okewu et al. 2020) that adapts the learning rate of each parameter based on its historical gradients and momentum and it is capable to adjust the parameters of a neural network in real-time to improve its accuracy and speed. For other regression models including SVM, MLR and DT models, Bayesian optimization was used. It uses the Bayesian formula to obtain the posterior information of the function distribution and combines the prior information of the unknown function with the sample information (Wu et al. 2019).

3.1 Split sample validation

According to Muraina (2022), splitting dataset is crucial as from training dataset model will learn effective mapping of inputs and produce effective outputs which means training dataset should have enough information for the model to learn the pattern and also testing dataset should have best possible representation of data thus model can show its performance. In another study by Gholamy et al. (2018) did an empirical studies to avoid overfitting and suggested 70–80% data for training and 20–30% for testing shows best results.

In this study a split sample validation (80–20%) technique is adopted to compare the performance of the CNN model with MLR, SVM and DT models. As stated by Gholamy et al. (2018), 70–80% data for training and 20–30% data for testing are appropriate. Out of the 201 selected catchments, 161 (80%) catchments are selected randomly for training, and the remaining 40 (20%) catchments are used for model testing.

3.2 Convolutional Neural Network (CNN)

A CNN is a form of deep learning model, which processes data in a grid pattern. Usually, CNN architecture consists of different building blocks. For this study, CNN regression mode is utilized, which contains convolutional layers with the rectified linear unit (ReLU) and dropout layers followed by one fully connected layer and regression layer. A convolutional layer is a fundamental layer in a CNN architecture, as it performs convolution operation by extracting features, which is a linear operation and then ReLU as an activation function does a nonlinear operation to make all negative values to zero.

In this study, all the numbers representing each of the variables have been considered as image input and converted to an array of numbers and digital images pixel values are stored in a two-dimension (2D) grid, therefore 2D-CNN has been used in this study. A small grid which is the neural networks filter known as the kernel (an optimizable feature extractor) slides across the input image by a Stride to optimize the feature extractor. The stride represents how many steps the kernel is moving across for each step (Yamashita et al. 2018). To process the image more precisely, padding has been added to the frame of the image to permit more space for the kernel to cover the image. This process continues until the kernel moves across the whole image. Then the output is considered input for the next layer.

The outputs of a convolution layer are then passed through the ReLU activation function. After the CNN layer and ReLU layer, the dropout layer has been added. Dropout is a technique where nodes drop randomly along with their connections in the network during training which prevents the network from overfitting and shows significant improvements (Lim 2021).

After few times repeating the process of convolution-ReLU-Dropout, the outputs need to be transformed into a one-dimensional (1D) array of numbers (or vector), therefore fully connected layer is added with a learnable weight. The weights that have been updated during training are denoted as the step of size or the “learning rate”. During the training of the convolution network, particular kernels and weights are calculated through forward propagation and updated through backpropagation. Before training starts, all the hyperparameters such as the size of the kernel, number of kernel, padding and stride have been set. After training the model, test data set has been given to the model to predict output then the predicted outputs and original outputs of the test dataset have been compared to evaluate the performance of the model. Figure 4 shows an overview of the developed CNN architecture and the training process adopted in this study.

3.3 Support vector machine (SVM)

SVM is a machine learning algorithm used for both classification and regression tasks. The basic idea behind SVM is to find the optimal hyperplane that separates the data points into different classes to predict the response variable with maximum margin. The margin is the distance between the hyperplane and the closest data points, and the SVM algorithm aims to maximize this distance while minimizing the prediction error. In SVM regression, the kernel function plays an important role in transforming the input variables into a higher-dimensional space where the relationship between the predictors and the response variable may be more linear. One common type of kernel function used in SVM regression is the radial basis function (RBF) kernel, which is a Gaussian function that maps the input variables to an infinite-dimensional space. In addition to the RBF kernel, other types of kernel functions can be used in SVM regression, including the polynomial kernel. The polynomial kernel is a type of kernel function that can be used to model non-linear relationships between the predictor variables and the response variable. When the degree of the polynomial kernel is set to 3, it is also referred to as the cubic SVM. The cubic SVM is useful in situations where the relationship between the predictor variables and the response variable is highly non-linear and cannot be captured by a linear or RBF kernel. Bagasta et al. (2019) did a comparison study between cubic SVM and Gaussian SVM to detect ischemic stroke and found that cubic SVM performed best for infraction classification. Hence, for this study cubic SVM is used.

3.4 Decision tree (DT) regression

In DT regression, the leaf nodes of the tree represent the predicted values for the output variable and the path from the root node to a leaf node represents the decision process that led to the prediction (Freund and Mason 1999). The tree is built by recursively partitioning the input data into smaller subsets based on the values of the input variables and finding the split that minimizes the mean squared error (MSE) of the prediction at each node. A Fine Tree (TF) with many small leaves is usually highly accurate in the training data. However, a very leafy tree tends to overfit, and its validation accuracy is often far lower than its training accuracy and it shows a highly flexible response function. In contrast, a Coarse Tree (TC) can be more robust and shows a coarse response function. In between, Medium Tree (TM) shows a less flexible response function by having at least 12 leaves compared to the Fine Tree's minimum leaf size of 4 and the coarse tree's minimum leaf size of 36.

Several studies in other fields compared different DT models performance. For example, Yaman et al. (2020) compared TF, TM and TC to estimate energy consumption and found TF performed best. In another study to estimate wind speed, AKINCI and NOĞAY (2019) found TC performed best among three DT models (TF, TM, TC). To select the best-performing DT model in this study, three DT models (TF, TM and TC) have been tested and based on lowest RMSE value the TM model has been chosen for this study.

3.5 Multiple linear regression (MLR)

MLR can be used to develop prediction equation in RFFA. In this study, ordinary least square (OLS) method is used to estimate the coefficients of the regression equation. The OLS is the maximum likelihood estimate of the parameters as it gives unbiased and minimum variance estimates of parameters where the errors are independent, identically and normally distributed (Draper and Smith 1998; Pandey and Nguyen 1999; Haddad and Rahman 2012). The adopted form of MLR is expressed by Eq. 1:

$$ \begin{aligned} Q_{T} = & {\text{b}}_{0} + {\text{b}}_{{1}} \left( {{\text{AREA}}} \right) + {\text{b}}_{{2}} \left( {{\text{I}}_{{{62}}} } \right) + {\text{b}}_{{3}} \left( {{\text{MAR}}} \right) + {\text{b}}_{{4}} \left( {{\text{SF}}} \right) \\ & + {\text{b}}_{{5}} \left( {{\text{MAE}}} \right) + {\text{b}}_{{6}} \left( {{\text{SDEN}}} \right) + {\text{b}}_{{7}} \left( {{\text{S1}}0{85}} \right) + {\text{b}}_{{8}} \left( {{\text{FOREST}}} \right) \\ \end{aligned} $$

(1)

where Q_T is flood quantile with AEP of 1 in T, b_o is intercept of the regression equation and b₁, b₂, … are regression coefficients.

3.6 Statistical indices

The following nine statistical indices (Eqs. 2–10) are adopted to compare the performances of the developed RFFA models:

Q_pred/Q_obs ratio (Q_r):

$$Qr= \frac{{Q}_{pred}}{{Q}_{obs}}$$

(2)

Relative error (RE):

$$RE=\frac{{Q}_{pred-{Q}_{obs}}}{{Q}_{obs}}\times 100$$

(3)

Median absolute relative error (RE_r):

$$REr=median[abs(RE)]$$

(4)

Mean square error (MSE):

$$ MSE = mean[(Q_{red} - Q_{{{\text{obs}}}} )]^{2} $$

(5)

Root mean square error (RMSE):

$$RMSE = \sqrt{MSE}$$

(6)

Bias:

$$Bias=mean(Q_{\text{pred}}-Q_{\text{obs}})$$

(7)

Relative bias (RBias):

$$RBias = \left[mean\left(\frac{{Q}_{pred}-{Q}_{obs}}{{Q}_{obs}}\right)\right]\times 100$$

(8)

Relative root mean square error (RRMSE):

$$RRMSE = \frac{\sqrt{mean \left[{\left({Q}_{pred}-{Q}_{obs}\right)}^{2}\right]}}{mean\left({Q}_{obs}\right)}$$

(9)

Root mean square normalised error (RMSNE):

$$RMSNE= \sqrt{mean\left[{\left(\frac{{Q}_{pred}-{Q}_{obs}}{{Q}_{obs}}\right)}^{2}\right]}$$

(10)

where Q_obs is the observed flood quantile from at-site flood frequency analysis by LP3 distribution at a given test catchment, and Q_pred is the predicted flood quantile obtained from the developed RFFA models for the test catchment.

4 Results

Each of the four developed RFFA models (CNN, SVM, TM and MLR) are tested on the test data set consisting of 40 catchments. Several statistical measures (Eqs. 2–10) and plots are used in this evaluation as presented below.

Table 2 provides seven statistical measures (based on the test data set consisting of 40 stations) for four different models (CNN, SVM, TM, and MLR) for six different flood quantiles. Table 2 reveals that the CNN model consistently outperforms the other models in terms of several statistical measures across the quantiles. Specifically, the CNN model exhibits the lowest RE_r values for five quantiles out of six, with the exception of Q₂. It also demonstrates the four lowest values of mean squared error (MSE) across the four quantiles except for Q₂ and Q₅. Furthermore, the CNN model achieves the lowest bias values for Q₂₀ and Q₁₀₀, as well as the lowest RBias value for Q₁₀. It also attains the four lowest RMSE values, except for Q₂ and Q₅, and the four lowest RMSNE values, except for Q₅ and Q₂₀. It should be noted that error values in Table 2 are generally higher, which is mainly due to highly variable hydrology in Australia. The currently recommended RFFA technique (ARR-RFFE Model) in the Australian Rainfall and Runoff (ARR) shows a similar/higher error statistics.

Table 2 Statistical evaluations of the four different models and six flood quantiles

Full size table

It should be noted that the CNN model does not consistently achieve the lowest value across all the statistical measures. In some cases, it performs as the second-best model, with exceptions such as Q₅-RMSNE, Q₅-RBias, Q₂₀-RBias, Q₅₀-RBias, and Q₁₀₀-RBias. Consequently, out of the 42 statistical measures (7 statistics *6 quantiles in Table 2) examined, the CNN model performs the best in 24 measures, followed by SVM (6 measures), TM (4 measures), and MLR (8 measures). Moreover, the CNN model ranks second best in 13 measures and third best in 5 measures. While the TM model demonstrates the highest performance for most measures in Q2, and the MLR model performs the best for most measures in Q₅, the CNN model surpasses the other models in terms of most statistical measures for the remaining quantiles (Q₁₀, Q₂₀, Q₅₀, and Q₁₀₀).

Based on the findings presented in Table 2, it can be concluded that the CNN model exhibits superior performance overall compared to the other three models. The CNN model consistently achieves the lowest values for the majority of statistical measures, indicating its strong predictive capability. To provide further insights into the performance of these models, Fig. 5 presents RE box plots, which offers a more detailed assessment of the CNN, SVM, TM, and MLR models across the six flood quantiles.

From Fig. 5, it is found that for Q₂, the smallest box width is associated with SVM, followed by CNN, TM and MLR (where CNN, SVM and TM have similar box width). For Q₅, the smallest box width is exhibited by SVM, followed by MLR, CNN and TM (SVM and MLR have similar box width, and CNN and TM have similar box width). For Q₁₀, the smallest box width is seen for CNN, followed by SVM, MLR and TM (CNN and SVM have similar box width). For Q₂₀, CNN shows the smallest box width, followed by SVM, MLR and TM (CNN and SVM have similar box width and TM and MLR have similar box width). For Q₅₀, SVM has the smallest box width, followed by CNN, TM and MLR (SVM, CNN and TM have similar box width, and box width for MLR is remarkably higher than the three other models). For Q₁₀₀, the smallest box width is provided by CNN, followed by SVM, MLR and TM (box width of TM is about double of the box width of CNN and SVM). Considering all the six quantiles, in terms of box width, CNN and SVM have similar results, which is much smaller than TM and MLR, in particular for higher return periods.

In Fig. 5, the median line of each model is represented by a thick line within the box. When the median line of a model is located below the 0:0 reference line (in Fig. 5), the model overall underestimates the observed flood quantiles and if median line is located above the 0:0 line, it indicates an overall overestimation by the model and if the median line coincides with the 0:0 line, it indicates the best model in terms of bias. In terms of bias, the best result is found for CNN (Q₂ and Q₁₀₀), followed by MLR (Q₅, Q₁₀ and Q₁₀₀). Overall, in terms of bias (as seen in Fig. 3), CNN outperforms the other three models, and TM shows notable overestimations for all the six flood quantiles.

The presence of outliers (indication of gross overestimation and underestimation by a model) is of great importance as it influences the performance of a model. An increased number of outliers contributes to greater variability in the model performance, thereby diminishing the statistical power of the model. Table 3 shows outlier number produced by each model as per Fig. 5. Overall, CNN has the smallest number of outliers followed by MLR, and SVM has the highest number of outliers.

Table 3 Number of outliers for six quantiles and four models

Full size table

Figure 6 displays boxplots representing the performance of four selected models across six quantiles using the Q_r (Q_pred/Q_obs) metric. The median line within each box, indicated by a thick line, serves as an indicator of the overall model performance, with a median line closer to 1 suggesting a better performance. Regarding the Q_r box plots for the six flood quantiles in Fig. 4, it is evident that the CNN model exhibits the narrowest Q_r box with fewer outliers compared to the other three models. On the other hand, the Q₅-SVM model demonstrates the narrowest box for Q_r, but it also has five outliers. Similarly, the Q₅-MLR model shows a narrower box compared to Q₅-CNN, with both models having only one outlier. Consequently, Q₅-MLR performs better than Q₅-CNN in terms of Q_r performance. Considering all the Q_r box plots produced by the four models for the six quantiles in Fig. 6, it can be concluded that overall, the CNN model exhibits the best performance compared to the other three models.

Table 4 presents the median values of Q_r for the four models. It is observed from Table 4 that the CNN model achieves median Q_r values ranging from 0.82 to 1.14 across the six different flood quantiles. In contrast, the other models display a larger range of median Q_r values compared to the CNN model. Based on the analysis of the boxplots and median values in Fig. 6 and Table 4, the CNN models demonstrate overall better performance compared to the three other models.

Table 4 Median value of Q_r for six quantiles by four models

Full size table

In summary, comparing the performance of the four models for six quantiles using the selected statistical indices, RE box plots and Q_r box plots, outlier number produced by each model, and median values of Q_r, it is evident that, overall, the CNN model outperforms the three other models. However, few catchments performed poorly by producing high Q_r in the CNN model and they influence the median value of RE_r and other statistical measures. In Figs. 7 and 8, these poorly performing catchments are denoted by their station name. Figure 7 shows Q_r values for different flood quantiles of these five outlier catchments. The catchment characteristics of these five catchments are illustrated in Fig. 8 and the thick line shows the median value of each catchment characteristic based on the data of the 201 selected catchments. Typically, a Q_r value closer to 1 indicates better model performance. Examining the results for each catchment (Figs. 7 and 8), Murrindindi River at Murrindindi above Colwells catchment consistently exhibits high Q_r values ranging from 4.1 to 6.8 across different flood quantiles. Despite its smaller AREA, Murrindindi River at Murrindindi above Colwells catchment has very small value of MAE as compared to the majority of the catchments. Pranjip Creek at Moorilim catchment has higher AREA but too small MAR, S1085 and FOREST. Big River d/s of Frenchman Creek Junction catchment has very small SF and higher FOREST. Grampians Rd Br catchment is characterized by very small AREA and very high S1085 and Avon River at Wimmera Highway catchment has very small MAR, S1085 and FOREST. These unusual characteristics might have contributed to poor performance of these catchments by the CNN model.

To investigate CNN model performance in depth, few well performed catchments based on Q_r value close to 1, have been chosen to analyse. Figure 9 shows Q_r values for different flood quantiles of these five well performed catchments. The catchment characteristics of these five catchments has shown in Fig. 10 where thick black line is representing median value of all selected catchments of this study. Despite of have same area, Murrindindi River at Murrindindi above Colwells catchment (Fig. 7) performed poor and Wanalta catchment (Fig. 9) performed well. In Figs. 8 and 10, it can be seen that Murrindindi River at Murrindindi above Colwells catchment has almost double MAR value, S1085 is very high, SF, SDEN and I₆₂ is slightly higher than Wanalta catchment which means Murrindindi River at Murrindindi above Colwells catchment is steeper and having more rainfall with higher intensity than Wanalta catchment but for slightly higher SDEN and FORST value, Murrindindi River at Murrindindi above Colwells catchment is showing drainage efficiency by producing less flood quantile in real, CNN models unable to learn this behavioural pattern by Murrindindi River at Murrindindi above Colwells catchment in this study.

Big River d/s of Frenchman Creek Junction catchment (Fig. 8) and Devlins Br catchment (Fig. 10) has also very close value for AREA, yet Devlins Br catchment (Fig. 7) is showing better Q_r value than Big River d/s of Frenchman Creek Junction catchment (Fig. 9). Most of the predictor variables between Big River d/s of Frenchman Creek Junction catchment and Devlins Br catchment are almost same except MAR and SF. Having higher MAR but less SF value than Devlins Br catchment, Big River d/s of Frenchman Creek Junction catchment is producing less quantile value in real observation than Devlins Br catchment (Table 5). CNN models could not capture Big River d/s of Frenchman Creek Junction catchment pattern but predicted almost same quantiles value for both catchments.

Table 5 Observed six quantiles (Q₂, Q₅, Q₁₀, Q₂₀, Q₅₀ and Q₁₀₀) of 5 poorly performed and 5 well performed catchments during testing of CNN models

Full size table

Pranjip Creek at Moorilim catchment (Fig. 8) and Redesdale catchment (Fig. 10) both are large catchments having area 787 km² and 629 km² respectively. However, having higher SDEN and FOREST but steeper slope (Fig. 10), Redesdale catchment is producing higher quantiles value than Pranjip Creek at Moorilim catchment (Table 5). CNN model failed to capture hydrological pattern of Pranjip Creek at Moorilim catchment but predicted almost same quantile values for both catchments. Gerrang Br catchment and Flowerdale catchment showing close value for mostly all predictor variables in Fig. 10, except slightly different value for AREA and SF. However, both catchments are showing better Q_r value in Fig. 9.

To understand CNN models learning phase, this study also investigate 4 catchments of training data set. Among them 2 catchments performed poor and another 2 catchments performed well. Table 5 is showing these 4 catchments where Cudgee catchment and Eungella catchment are poorly performed catchments and Glencairn catchment and Jacobs Ladder catchment are well performed catchment.

From Table 6, it can be seen that Cudgee catchment and Glencairn catchment both has same AREA. Being a flatter catchment and having less value for all other predictor, Cudgee catchment is producing significant less quintile value than Glencairn catchment (Table 7) but CNN models predicted close value of quantiles for both catchments and produced high Q_r value for Cudgee catchment. Jacobs Ladder catchment is highly steeper than Eungella catchment but SF is almost same for both. Having lower I₆₂, MAR, MAE, SDEN predictor variables value, Jacobs Ladder catchment producing less quantile than Eungella catchment, which CNN models could captured perfectly during training and Q_r value for Jacobs Ladder catchment is close to 1 for all quantiles (Table 6). Eungella catchment is producing high quantiles value for all quantiles despite of having flatter slope but higher I₆₂, MAR, MAE, SDEN. In case of Eungella catchment, CNN model failed to predict. But predicted value by CNN models for both catchments (Jacobs Ladder and Eungella) are close.

Table 6 Poor and well performed catchments characteristics and Q_r value during training

Full size table

Table 7 Poor and Well performed observed six quantiles (Q₂, Q₅, Q₁₀, Q₂₀, Q₅₀ and Q₁₀₀) value during Training

Full size table

5 Discussion

Likeany other neural networks, CNN relies on a bigger training data set to learn the pattern in the data. CNN also tends to overfit, but in this study a dropout layer was used to reduce overfitting. The dropout layers prevent the networks from overfitting by removing the neurons which force the network to overfit. In the CNN method, every convolution layer should be followed by an activation layer. In this study, ReLU (Rectified Linear Unit) operation was used as activation layer so that the network can account for non-linearity. Although the CNN has been known to be a good pattern recognition model, in this study, the CNN model had limited learning opportunity due to relatively smaller data set. In this regard, a Monte Carlo cross validation technique can be adopted (Haddad et al. 2013), which randomly splits the data for training and validation hundreds of time to evaluate prediction error.”

Also, in future studies it would be worth to create sub set of catchments based on homogeneity (Msilini et al. 2020). Selecting important feature would be beneficial as well in future studies. Sensitivity analysis of the inputs variable would be also helpful to extract best independent variables (Heidarpanah et al. 2023).

In this study four different RFFA models (CNN, SVM, TM and MLR) are evaluated using data from 201 catchments in south-east Australia. Comparing performances of these four models based on several statistical measures (Eqs. 2–10), it is found that the CNN regression model performs better than the other three models.

To gain further performance level of the CNN model developed in this study, few statistical measures (RE_r, RBias, RMSE and RRMSE) based on this study are compared with other RFFA studies. Our CNN models show RE_r values ranging 29% to 44%, which are comparable to Ali and Rahman (2022) who reported RE_r values in the range of 28% to 39% for a kriging based RFFA model for NSW and Victoria states of Australia. In another study, Noor et al. (2022) found RE_r values ranging 16% to 41% for Victoria by using a generalized additive model (GAM). Rahman and Rahman (2020) noted RE_r values between 22 and 37% for their index flood method (IFM) for NSW. Zalnezhad et al. (2022a) developed a quantile regression technique (QRT) for NSW and Victoria and found RE_r ranging from 36 to 48% and for their ANN model the RE_r values were in the range of 33–54%. A recent study by Zalnezhad et al. (2023) using IFM method found RE_r values ranging from 32 to 46%. Aziz et al. (2015) found RE_r values ranging from 37 to 72% for south-east Australia for their GAANN model. In another study, Aziz et al. (2017) found RE_r ranging from 36 to 46% based on their ANN model for south-east Australia. ARR RFFA model (Rahman et al. 2019) reported RE_r ranging from 49 to 59% for eastern Australia; however, it should be noted that ARR-RFFA model used 558 catchments and leave-one-out validation technique to evaluate model accuracy, which is more rigorous than split-sample validation technique adopted in this study.

In relation to RBias, the CNN model developed in this study shows values in the range of 14 to 43, which were found to be in the range of 32–57 by Zalnezhad et al. (2022a) for their QRT model in south-east Australia. In another study, Shu and Ouarda (2008) found RBias ranging from − 11 to − 8 using an ANFIS model in Quebec, Canada. In terms of lowest RMSE, this study found 14.78. Allahbakhshian-Farsani et al. (2020) used SVM model and found lowest RMSE of 50.7. Zalnezhad et al. (2022a) used ANN method and found the lowest RMSE value of 50.15. This study found RRMSE value ranging from 0.61 to 0.85 for the CNN method. The RRMSE values of this study closely align with the study by Ouarda and Shu (2009) where they used ANFIS model and found RRMSE ranging from 0.57 to 0.64. Zalnezhad et al. (2022a) found RRMSE values in the range of 0.79 to 1.02 for their ANN based RFFA model. Another study by Zalnezhad et al. (2022a, b) found RRMSE values in the range of 0.75 to 1.01 for QRT method. Zalnezhad et al. (2023) used IFM method and found RRMSE values in the range of 0.74 to 1.12. Overall, the CNN model developed in this study performs better than most of the previously reported similar studies.

6 Conclusion

This study focuses on RFFA in south-east Australia using data from 201 catchments. It compares CNN based RFFA model with MLR, SVM, and DT based RFFA models. The performances of these models are evaluated using a split-sample validation technique based on nine statistical measures for six different flood quantiles (Q₂, Q₅, Q₁₀, Q₂₀, Q₅₀ and Q₁₀₀). It is found that the CNN model performs best for AEPs in the range of 1 in 5 to 1 in 100, with median relative error in the range of 29% to 44%. The DT model shows better performance for 1 in 2 AEP, with a median relative error of 24%. The CNN model outperforms the currently recommended RFFA model in Australian Rainfall and Runoff guideline. The developed CNN based RFFA model performs better than similar previous studies.

However, the CNN models face challenges in accurately predicting flood quantiles for certain catchments with extreme characteristics. To enhance the performance of CNN models in future studies, it is recommended to create sub set of catchments based on homogeneity, conduct important feature selection and sensitivity analysis for input variables. By identifying important feature, the application of independent variables can be optimized to improve model performance. Future studies should apply Monte Carlo and leave-one-out cross validation techniques to evaluate CNN based RFFA model using data from other Australian states, which will assist to recommend more accurate RFFA techniques in Australian Rainfall and Runoff guideline.

References

Abdurohman M, Putrada AG, Deris MM (2022) A robust internet of things-based aquarium control system using decision tree regression algorithm. IEEE Access 10:56937–56951
Article Google Scholar
Ahmed A, Khan Z, Rahman A (2024) Searching for homogeneous regions in regional flood frequency analysis for Southeast Australia. J Hydrol Region Stud 53:101782
Article Google Scholar
Akinci TÇ, Noğay HS (2019) Application of decision tree methods for wind speed estimation. Eur J Tech 9(1):74–83
Article Google Scholar
Ali S, Rahman A (2022) Development of a kriging-based regional flood frequency analysis technique for South-East Australia. Nat Hazards 114(3):2739–2765
Article Google Scholar
Allahbakhshian-Farsani P, Vafakhah M, Khosravi-Farsani H, Hertig E (2020) Regional flood frequency analysis through some machine learning models in semi-arid regions. Water Resour Manage 34(9):2887–2909
Article Google Scholar
Archfield SA, Pugliese A, Castellarin A, Skøien JO, Kiang JE (2013) Topological and canonical kriging for design flood prediction in ungauged catchments: an improvement over a traditional regional regression approach? Hydrol Earth Syst Sci 17(4):1575–1588
Article Google Scholar
Aurna NF, Yousuf MA, Taher KA, Azad AKM, Moni MA (2022) A classification of MRI brain tumor based on two stage feature level ensemble of deep CNN models. Comput Biol Med 146:105539
Article Google Scholar
Australian Rainfall Runoff (2019) https://arr.ga.gov.au/
Aziz K, Rahman A, Fang G, Shrestha S (2014) Application of artificial neural networks in regional flood frequency analysis: a case study for Australia. Stoch Env Res Risk Assess 28(3):541–554
Article Google Scholar
Aziz K, Rai S, Rahman A (2015) Design flood estimation in ungauged catchments using genetic algorithm-based artificial neural network (GAANN) technique for Australia. Nat Hazards 77:805–821
Article Google Scholar
Aziz K, Haque MM, Rahman A, Shamseldin AY, Shoaib M (2017) Flood estimation in ungauged catchments: application of artificial intelligence based methods for Eastern Australia. Stoch Environ Res Risk Assess 31(6):1499–1514
Article Google Scholar
Aziz K, Rahman A, Shamseldin A, Shoaib M (2013) Regional flood estimation in Australia: Application of gene expression programming and artificial neural network techniques. In: Proceedings of the 20th international congress on modelling and simulation, Adelaide, Australia, pp 1–6
Bagasta AR, Rustam Z, Pandelaki J, Nugroho WA (2019) Comparison of cubic SVM with Gaussian SVM: classification of infarction for detecting ischemic stroke. In: IOP conference series: materials science and engineering, vol 546, No. 5, p 052016. IOP Publishing
Bates BC, Rahman A, Mein RG, Weinmann PE (1998) Climatic and physical factors that influence the homogeneity of regional floods in southeastern Australia. Water Resour Res 34(12):3369–3381
Article Google Scholar
Benson MA (1959) Channel-slope factor in flood-frequency analysis. J Hydraul Div 85(4):1–9
Article Google Scholar
Bobee B, Cavadias G, Ashkar F, Bernier J, Rasmussen P (1993) Towards a systematic approach to comparing distributions used in flood frequency analysis. J Hydrol 142:21–36
Article Google Scholar
Chebana F, Charron C, Ouarda TB, Martel B (2014) Regional frequency analysis at ungauged sites with the generalized additive model. J Hydrometeorol 15(6):2418–2428
Article Google Scholar
Chen W, Li Y, Xue W, Shahabi H, Li S, Hong H, Ahmad BB (2020) Modeling flood susceptibility using data-driven approaches of naïve bayes tree, alternating decision tree, and random forest methods. Sci Total Environ 701:134979
Article CAS Google Scholar
Chen C, Hui Q, Xie W, Wan S, Zhou Y, Pei Q (2021) Convolutional neural networks for forecasting flood process in internet-of-things enabled smart city. Comput Netw 186:107744
Article Google Scholar
Chen J, Li Y, Zhang S (2023) Fast prediction of urban flooding water depth based on CNN− LSTM. Water 15(7):1397
Article Google Scholar
Cunnane C (1988) Methods and merits of regional flood frequency analysis. J Hydrol 100:269–290
Article Google Scholar
Dawson CW, Abrahart RJ, Shamseldin AY, Wilby RL (2006) Flood estimation at ungauged sites using artificial neural networks. J Hydrol 319(1–4):391–409
Article Google Scholar
Draper NR, Smith H (1998) Applied regression analysis, vol 326. Wiley
Esmaeili-Gisavandani H, Zarei H, Fadaei Tehrani MR (2023) Regional flood frequency analysis using data-driven models (M5, random forest, and ANFIS) and a multivariate regression method in ungauged catchments. Appl Water Sci 13(6):139
Article Google Scholar
Filipova V, Hammond A, Leedal D, Lamb R (2022) Prediction of flood quantiles at ungauged catchments for the contiguous USA using Artificial Neural Networks. Hydrol Res 53(1):107–123
Article Google Scholar
Freund Y, Mason L (1999) The alternating decision tree learning algorithm. In: ICML, vol 99, pp 124–133
Ghaderi K, Motamedvaziri B, Vafakhah M, Dehghani AA (2019) Regional flood frequency modeling: a comparative study among several data-driven models. Arab J Geosci 12(18):1–9
Article CAS Google Scholar
Gholamy A, Kreinovich V, Kosheleva O (2018) Why 70/30 or 80/20 relation between training and testing sets: a pedagogical explanation. Int J Intell Technol Appl Stat 11(2):105–111
Google Scholar
Ghosh A, Dey P, Ghosh T (2022) Integration of RS-GIS with frequency ratio, fuzzy logic, logistic regression and decision tree models for flood susceptibility prediction in lower gangetic plain: a study on Malda District of West Bengal, India. J Indian Soc Remote Sensing 50(9):1725–1745
Article Google Scholar
Gilmore I, Stensmyr P, Babister M, Retallick M, Ball J (2014) Comparison of regional flood methods in New South Wales. In: Hydrology and water resources symposium 2014, pp 836–843. Engineers Australia, Barton, ACT
González-Sánchez A, Monge-Martínez J, Ballesteros-López L, Armas-Arias S (2022) Logistic regression model and decision trees to analyze changes in tourist behavior: Tungurahua case study. In: Emerging research in intelligent systems: proceedings of the CIT 2021 volume 2, pp 210–221. Springer International Publishing, Cham
Habbat N, Anoun H, Hassouni L (2022) Combination of GRU and CNN deep learning models for sentiment analysis on French customer reviews using XLNet model. IEEE Eng Manage Rev 51(1):41–51
Article Google Scholar
Haddad K, Rahman A (2012) Regional flood frequency analysis in eastern Australia: Bayesian GLS regression-based methods within fixed region and ROI framework–quantile regression vs. parameter regression technique. J Hydrol 430:142–161
Article Google Scholar
Haddad K, Rahman A (2020) Regional flood frequency analysis: evaluation of regions in cluster space using support vector regression. Nat Hazards 102(1):489–517
Article Google Scholar
Haddad K, Rahman A, Weinmann PE, Kuczera G, Ball J (2010) Streamflow data preparation for regional flood frequency analysis: lessons from southeast Australia. Australas J Water Resour 14(1):17–32
Article Google Scholar
Haddad K, Rahman A, Zaman MA, Shrestha S (2013) Applicability of Monte Carlo cross validation technique for model development and validation using generalised least squares regression. J Hydrol 482:119–128
Article Google Scholar
Heidarpanah M, Hooshyaripor F, Fazeli M (2023) Daily electricity price forecasting using artificial intelligence models in the Iranian electricity market. Energy 263:126011
Article Google Scholar
Hosking JRM, Wallis JR (1993) Some statistics useful in regional frequency analysis. Water Resour Res 29(2):271–281
Article Google Scholar
Jiang C, Jiang C, Chen D, Hu F (2022) Densely connected neural networks for nonlinear regression. Entropy 24(7):876
Article Google Scholar
Jingyi Z, Hall MJ (2004) Regional flood frequency analysis for the Gan-Ming River basin in China. J Hydrol 296(1–4):98–117
Article Google Scholar
Kabir S, Patidar S, Xia X, Liang Q, Neal J, Pender G (2020) A deep convolutional neural network model for rapid prediction of fluvial flood inundation. J Hydrol 590:125481
Article Google Scholar
Khosravi K, Pham BT, Chapi K, Shirzadi A, Shahabi H, Revhaug I, Bui DT (2018) A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci Total Environ 627:744–755
Article CAS Google Scholar
Kimura N, Yoshinaga I, Sekijima K, Azechi I, Baba D (2019) Convolutional neural network coupled with a transfer-learning approach for time-series flood predictions. Water 12(1):96
Article Google Scholar
Kirby W, Moss M (1987) Summary of flood frequency analysis in the United States. J Hydrol 96:5–14
Article Google Scholar
Kuczera G, Franks S (2019) At-site flood frequency analysis. In: Ball et al. (eds) Australian rainfall & runoff, Chapter 2, Book 3, Commonwealth of Australia
Lee DH, Liu JL (2023) End-to-end deep learning of lane detection and path prediction for real-time autonomous driving. SIViP 17(1):199–205
Article Google Scholar
Lim HI (2021) A study on dropout techniques to reduce overfitting in deep neural networks. In: Advanced multimedia and ubiquitous engineering: MUE-FutureTech 2020, pp 133–139. Springer, Singapore
Longman J, Bennett-Levy J, Matthews V, Berry H, Passey M, Rolfe M, Morgan G, Braddon M, Bailie R (2019) Rationale and methods for a cross-sectional study of mental health and wellbeing following river flooding in rural Australia, using a community-academic partnership approach. BMC Public Health 19:1255
Article CAS Google Scholar
Manmadhan S, Kovoor BC (2023) Object-assisted question featurization and multi-CNN image feature fusion for visual question answering. Int J Intell Inf Technol 19(1):1–19
Article Google Scholar
Msilini A, Masselot P, Ouarda TB (2020) Regional frequency analysis at ungauged sites with multivariate adaptive regression splines. J Hydrometeorol 21(12):2777–2792
Article Google Scholar
Muraina I (2022) Ideal dataset splitting ratios in machine learning algorithms: general concerns for data scientists and data analysts. In: 7th international Mardin Artuklu scientific research conference, pp 496–504
National Research Council (1988) estimating probabilities of extreme floods: methods and recommended research, 141, National Academy Press, Washington D.C.
Noor F, Laz OU, Haddad K, Alim MA, Rahman A (2022) Comparison between quantile regression technique and generalised additive model for regional flood frequency analysis: a case study for Victoria. Australia Water 14(22):3627
Article Google Scholar
Okewu E, Misra S, Lius FS (2020) Parameter tuning using adaptive moment estimation in deep learning neural networks. In Computational science and its applications–ICCSA 2020: 20th international conference, Cagliari, Italy, July 1–4, 2020, Proceedings, Part VI 20, pp 261–272. Springer International Publishing
Ouarda TB, Shu C (2009) Regional low‐flow frequency analysis using single and ensemble artificial neural networks. Water Resour Res 45(11)
Ouarda TBMJ, Ba KM, Diaz-Delgado C, Carsteanu A, Chokmani K, Gingras H, Quentin E, Trujillo E, Bob´ee B, (2008) Intercomparison of regional flood frequency estimation methods at ungauged sites for a Mexican case study.
Pandey GR, Nguyen VTV (1999) A comparative study of regression based methods in regional flood frequency analysis. J Hydrol 225(1–2):92–101
Article Google Scholar
Patel M, Elgazzar H (2023) Road object classification using CNN.
Pijush S (2011) Application of least square support vector machine (LSSVM) for determination of evaporation losses in reservoirs. Engineering
Pilgrim DH, Cordery I (1993) Chapter 9: flood runoff. handbook of hydrology. McGraw-Hill, New York
Potter KW (1987) Research on flood frequency analysis: 1983–1986. Rev Geophys 25(2):113–118
Article Google Scholar
Rahman A (2005) A quantile regression technique to estimate design floods for ungauged catchments in south-east Australia. Australas J Water Resour 9(1):81–89
Article Google Scholar
Rahman AS, Rahman A (2020) Application of principal component analysis and cluster analysis in regional flood frequency analysis: a case study in New South Wales. Australia Water 12(3):781
Article Google Scholar
Rahman A, Haddad K, Zaman M, Kuczera G, Weinmann PE (2011) Design flood estimation in ungauged catchments: a comparison between the probabilistic rational method and quantile regression technique for NSW. Australas J Water Resour 14(2):127–139
Article Google Scholar
Rahman SA, Rahman A, Zaman M, Haddad K, Ashan A, Imteaz MA (2013) A study on selection of probability distributions for at-site flood frequency analysis in Australia. Nat Hazards 69:1803–1813
Article Google Scholar
Rahman AS, Khan Z, Rahman A (2020) Application of independent component analysis in regional flood frequency analysis: Comparison between quantile regression and parameter regression techniques. J Hydrol 581:124372
Article Google Scholar
Rahman A, Bates BC, Mein RG, Weinmann E (1998) Regional flood frequency analysis for ungauged basins in south-eastern Australia. In 1998 Spring Meeting.
Rahman A, Haddad K, Kuczera G, Weinmann E (2009) Australian rainfall and runoff revision project 5: regional flood methods.
Rahman A, Haddad K, Haque M, Kuczera G, Weinmann P (2015) Australian rainfall and runoff project 5: regional flood methods: stage 3 report. Commonwealth of Australia (Geoscience Australia): Canberra, Australia.
Rahman A, Haddad K, Kuczera G, Weinmann PE (2019) Regional flood methods. In: Australian rainfall & runoff, Chapter 3, Book 3, edited by Ball et al., Commonwealth of Australia
Rahman A (1997) Flood Estimation for ungauged catchments: a regional approach using flood and catchment characteristics (Doctoral dissertation, Monash University).
Şen Z (1980) Regional drought and flood frequency analysis: Theoretical consideration. J Hydrol 46(3–4):265–279
Article Google Scholar
Sharifi Garmdareh E, Vafakhah M, Eslamian SS (2018) Regional flood frequency analysis using support vector regression in arid and semi-arid regions of Iran. Hydrol Sci J 63(3):426–440
Article Google Scholar
Shorabeh SN, Samany NN, Minaei F, Firozjaei HK, Homaee M, Boloorani AD (2022) A decision model based on decision tree and particle swarm optimization algorithms to identify optimal locations for solar power plants construction in Iran. Renew Energy 187:56–67
Article Google Scholar
Shu C, Ouarda TB (2008) Regional flood frequency analysis at ungauged sites using the adaptive neuro-fuzzy inference system. J Hydrol 349(1–2):31–43
Article Google Scholar
Sivakumar B, Singh VP (2012) Hydrologic system complexity and nonlinear dynamic concepts for a catchment classification framework. Hydrol Earth Syst Sci 16(11):4119–4131
Article Google Scholar
Smith A, Sampson C, Bates P (2015) Regional flood frequency analysis at the global scale. Water Resour Res 51(1):539–553
Article Google Scholar
Stedinger JR, Tasker GD (1985) Regional hydrologic analysis: 1. Ordinary, weighted, and generalized least squares compared. Water Resour Res 21(9):1421–1432
Article Google Scholar
Tehrany MS, Pradhan B, Jebur MN (2013) Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J Hydrol 504:69–79
Article Google Scholar
Vafakhah M, Khosrobeigi Bozchaloei S (2020) Regional analysis of flow duration curves through support vector regression. Water Resour Manage 34(1):283–294
Article Google Scholar
Vapnik V, Chervonenkis A (1974) Theory of pattern recognition
Vapnik VN (1995) The nature of statistical learning. Theory
Wang Y, Fang Z, Hong H, Peng L (2020) Flood susceptibility mapping using convolutional neural network frameworks. J Hydrol 582:124482
Article Google Scholar
Warner B, Misra M (1996) Understanding neural networks as statistical tools. Am Stat 50(4):284–293
Article Google Scholar
Wu D, He Y, Feng S, Sun DW (2008) Study on infrared spectroscopy technique for fast measurement of protein content in milk powder based on LS-SVM. J Food Eng 84(1):124–131
Article CAS Google Scholar
Wu J, Chen XY, Zhang H, Xiong LD, Lei H, Deng SH (2019) Hyperparameter optimization for machine learning models based on Bayesian optimization. J Electron Sci Technol 17(1):26–40
Google Scholar
Yaman O, Yetis H, Karakose M (2020) Decision tree based customer analysis method for energy planning in smart cities. In 2020 International conference on data analytics for business and industry: way towards a sustainable economy (ICDABI), pp 1–4. IEEE
Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9:611–629
Article Google Scholar
Yuan F, Zhang Z, Fang Z (2023) An effective CNN and Transformer complementary network for medical image segmentation. Pattern Recogn 136:109228
Article Google Scholar
Zalnezhad A, Rahman A, Vafakhah M, Samali B, Ahamed F (2022a) Regional flood frequency analysis using the FCM-ANFIS algorithm: a case study in South-eastern Australia. Water 14(10):1608
Article Google Scholar
Zalnezhad A, Rahman A, Nasiri N, Vafakhah M, Samali B, Ahamed F (2022b) Comparing performance of ANN and SVM methods for regional flood frequency analysis in South-East Australia. Water 14(20):3323
Article Google Scholar
Zalnezhad A, Rahman A, Ahamed F, Vafakhah M, Samali B (2023) Design flood estimation at ungauged catchments using index flood method and quantile regression technique: a case study for South East Australia. Nat Hazards, pp 1–24
Zhou H (2022) research of text classification based on TF-IDF and CNN-LSTM. J Phys Conf Ser 2171(1):1012021
Article Google Scholar
Zorn CR, Shamseldin AY (2015) Peak flood estimation using gene expression programming. J Hydrol 531:1122–1128
Article Google Scholar

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions.

Author information

Authors and Affiliations

Western Sydney University, Building XB, Second Avenue, Kingswood, NSW, 2751, Australia
Nilufa Afrin, Farhad Ahamed & Ataur Rahman

Authors

Nilufa Afrin
View author publications
You can also search for this author in PubMed Google Scholar
Farhad Ahamed
View author publications
You can also search for this author in PubMed Google Scholar
Ataur Rahman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ataur Rahman.

Ethics declarations

Conflict of interest

Authors declare that there is no conflict of interest to declare, and no funding was received to carry out this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Afrin, N., Ahamed, F. & Rahman, A. Development of a convolutional neural network based regional flood frequency analysis model for South-east Australia. Nat Hazards (2024). https://doi.org/10.1007/s11069-024-06669-z

Download citation

Received: 03 November 2023
Accepted: 05 May 2024
Published: 14 May 2024
DOI: https://doi.org/10.1007/s11069-024-06669-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Development of a convolutional neural network based regional flood frequency analysis model for South-east Australia

Abstract

Similar content being viewed by others

Deep learning algorithms to develop Flood susceptibility map in Data-Scarce and Ungauged River Basin in India

Integration of convolutional neural networks for flood risk mapping in Tuscany, Italy

Flood Replication Using ANN Model Concerning with Various Catchment Characteristics: Narmada River Basin

1 Introduction

2 Study area and data