Introduction

Cities in low-lying coastal areas are vulnerable to flooding (Patel, et al. 2019). The increased frequency of flood events has brought serious economic losses and social impacts. A reliable flood forecast is important to mitigate damage from flooding. Data-driven models based on the training and calibration of hydrological data are used to predict flooding. Different from the traditional numerical models, building such models requires low cognitive requirements, simple structure, and fast computationally, yielding more accurate results (Solomantine and A., 2008). Various machine learning methods are used to develop data-driven models, among which, artificial neural networks are the most widely used.

Neural network models trained from data are affected by uncertainty. Sources of uncertainty and their coping methods include (Abrahart and Anctil 2012; Herzog 2020): (i) The uncertainty of the input data. Models trained on larger data sets are historically more representative than those trained on smaller datasets, though choosing the correct data to train the model is the best way to improve its competitiveness. (ii) The uncertainty of model structure. The most optimal structures are important in fitting a model that can adequately account for all parameters to describe changes in flooding. (iii) The uncertainty of parameters. Models’ parameters need to be optimized to get the most exact output. Uncertainty affects the prediction results of the ANN model, thus reducing the potential applications of the model (Kasiviswanathan and Sudheer 2013). Reducing model uncertainty is crucial for both research and practical applications. Numerous approaches that discuss dealing with uncertainty in neural networks have been put forth (Bowden et al. 2005a, 2005b). The importance of the ANN model for suitable input selection in water resources applications has been discussed with two different input determination techniques: The model-free method and the self-organization mapping methods are proposed (Gomez et al. 2019). The impact of the uncertainties associated with a weather forecast, tidal, and storm surge prediction on the accuracy of the flood prediction model was studied (Tiwari,, et al. 2010). An artificial neural network based on the bootstrap program was designed to quantify the uncertainty of the model parameters in flood prediction (Zhang and Shin 2021). A new uncertainty propagation scheme based on the Gaussian Mixture Model for neural networks was proposed in Kabir (2021). Uncertainty bounds were used to quantify the uncertainty in the neural network output (Berkhahn et al. 2019; Jhong et al. 2017). Therefore, integration methods are used to capture the impacts of output uncertainty on flood prediction.

The above work all processes the uncertainty of neural networks from a single source. This study uses the EANN model to predict the degree of flooding in urban areas by combining three methods according to reduce the overall uncertainty of neural networks and improve prediction accuracy. The first method is a heuristic neural path intensity feature selection method, which is used to select the best input variables from the candidate datasets. The second method is a coupling method, which is used to determine the number of hidden layer neurons and the number of training sample sets, reducing the network complexity, and determining the best network architecture. The third method is an integration method, we quote an ensemble by paralleling three neural networks with different predicted lead periods together and to finish the prediction task. A simple linear average ensemble, weighted linear average ensemble, and nonlinear neural ensemble, to conduct integrated output on the prediction result. The best result is chosen as the final output, thus capturing the uncertainty of the network output. Moreover, an attempt is also made to compare the performance of the ANN models on the training datasets of different lengths.

Materials and methods

Data acquisition

This paper utilizes the flood data set from the Chinese city of Macao for the case study. It is presented in another paper by the authors (Dai et al. 2021), which contains flood-related data related to seven distinct typhoon events. The prediction model output is the flood depth value at future intervals. Potential inputs to the model include geographic parameters and flood depth of the flooded areas, optimal typhoon path, urban meteorology, and tides; which can be categorized by invariant and time-lag features.

Input variable selection

The first step in artificial neural network development is to determine the important input variables or characteristics to account for. The selection of high-quality and representative data is integral to creating an accurate model. A large number of potential input variables need to be identified, including all variables concerning the output variables being modeled, to minimize any potential loss of information. Additionally, the model must be able to eliminate variables with little information or noise to ensure that the variables are all relevant and an appropriate input set is reached. The flood prediction studied in this paper is a time series problem, and appropriate lags must also be chosen to maximize reliability. The output of the prediction model can be obtained from the following formula:

$${D}_{(t+T)}=f\left(z, {u}_{\left(t-0*T\right)},{u}_{\left(t-1*T\right)},\dots ,{u}_{\left(t-m*T\right)}\right)$$
(1)

In the formula, \(z\) is the invariant feature of the input, \(u\) is the input’s time-lag feature, \(T\) is the lead period, and \(m\) is the lagged value. For example, when \(m=2\), the model outputs the flood depth value \({D}_{(t+30 {\text{min}})}\) after \(30\) minutes of \(T\) time. The lagged feature is the flood-related time series features of the \({u}_{\left(t-0*30 min\right)}\), \({u}_{\left(t-1*30 min\right)}\) before 30 min and \({u}_{\left(t-2*30 {\text{min}}\right)}\) before 60 min at current \(T\) time. Determining how many lagged values are included in each input time series is a difficulty (Bowden et al. 2005c). In typhoon events, flooding can be rapid and transient. For example, in the typhoon “1713 HATO”, the flooding that caused the disaster in Macao occurred within 30 min. Considering the limited length of the flood-related time series in the datasets, it was determined that the maximum of the lagged value \(m\) is set to \(2\) and the maximum of the lagged time is \(2T\). Table 1 describes the potential input variables used in the model selection.

Table 1 A summary of potential input variables

The HNPSFS method is used to select the input features in this study. The forward selection was firstly performed to construct an initial network by selecting a set of input features. During each subsequent step, a set of time-lagged features was added to the input and the model was trained until it achieved the desired results. Backward selection is then performed to introduce a neural pathway strength feature selection method (Usman et al. 2017). Uncorrelated variables and their subsequent output variables are removed to retain the appropriate input features. Formula 2 calculates the strength of a specific path of neural networks from a given input to an output:

$${W}_{IO}={W}_{{\text{IH}}}*{W}_{{\text{HO}}}$$
(2)

In Formula 2, \({W}_{{\text{IH}}}\) is the weight between the input layer and the hidden layer. \({W}_{{\text{HO}}}\) is the weight between the hidden and output layers. The larger the \({W}_{{\text{IO}}}\) is, the more important the feature is, leading to a retained input variable. When the value of \({W}_{{\text{IO}}}\) is 0 and below, it indicates that the input feature suppresses the output and should be removed.

Network architecture selection

Artificial neural networks need to quantify the model coefficients and adapt and modify the network architecture to become more accurate and facilitate data learning. Network training is used to determine the weights and thresholds, like found in the Levenberg–Marquardt algorithm, between the input and hidden layers, the hidden layers and the output layers, and the appropriate transfer functions, generally Sigmoid and linear functions, are selected. These methods are consistent. Two other important parameters for selecting the network architecture, which are the number of hidden layer neurons and the number of datasets used for training, currently have no consistent method, which is usually dealt with based on a typical trial-and-error approach (Abrahart and Anctil 2012). This study adopts a coupling method to select these two parameters. First, the initial value of the number of neurons in a hidden layer is set to derive the following formula:

$${n}_{H}=\sqrt{m+l}+c c\in [1 10]$$
(3)

In Formula 3, \({{\text{n}}}_{H}\) is the number of neurons in the hidden layer, \(m\) is the number of neurons in the input layer and \(l\) is the number of neurons in the output layer. Then, starting from the calculated initial value of \({{\text{n}}}_{H}\), the value is gradually reduced during each network training. Mean Squared Error (MSE) and training epochs are used as indicators to evaluate the network performance in each iteration. When the indicator value is the minimum value, the corresponding \({{\text{n}}}_{H}\) is selected as the nodes of neurons in the hidden layer.

Similarly, on the basis that the \({{\text{n}}}_{H}\) value has been selected properly, whole-body training data are sampled at different time intervals to derive datasets varying from large to small. Network training is performed using each subset, when the MSE and training epochs are minimal, the corresponding subset is selected as the determined training dataset.

Ensemble artificial neural network

Integration methods can capture the effects of the ANN output uncertainty, reduce output variance, and obtain more accurate results than individual models. Three ensemble techniques are adopted to improve the model’s performance, simple average ensemble (SAE), weighted average ensemble (WAE), and nonlinear neural ensemble (NNE) (Nourani et al. 2018).

The prediction model must choose the appropriate lead period. The longer the predicted lead period, the more potential the model has, but a longer lead period also leads to greater degrees of uncertainty (Jhong et al. 2017). As said previously, urban flooding usually has a short lead period. This study has determined a forecast lead period of 30 min based on the needs of the relevant urban management departments to issue flood warnings. With this information, three network models with a prediction lead period of 30 min, 60 min, and 120 min were constructed. They are all trained using the error backpropagation algorithm and are named BPNN30, BPNN60, and BPNN120. The error backpropagation algorithm stands as one of the most widely used optimization algorithms in neural network training, playing a pivotal role in enhancing the network's performance. By adjusting the weights and biases within the network, this algorithm aims to minimize the error function, thereby enabling the neural network to better adapt to the training data. This advantage also serves as a springboard for us to delve into another pertinent issue, namely, the impact of varying lengths of training datasets on the performance of ANN models. The output of the three networks is aligned with the basis of time. The integrated output of the flood forecast value with an advance time of 30 min was conducted with the SAE, WAE, and NNE technologies. The performance of the three integrated techniques was then compared and the best result among them was selected as the final result of flood prediction. The structure of the ensemble neural network is shown in Fig. 1.

Fig. 1
figure 1

Structure diagram of the EANN

Different from general integration methods, this study does not integrate multiple different types of models, instead integrating the results of neural network models with different prediction lead periods. This is an innovative attempt to fit a model with the appropriate lead period with high accuracy.

Model evaluation

This study aimed to predict the depth of continuous flooding and employs rigorous model evaluation techniques to identify the most suitable regression model for accurate predictions. Root of Mean Square Error (RMSE) and Coefficient of Determination (R2) are measures used to evaluate the network model (Dai and Cai 2021). The calculation formula can be expressed as:

$${\text{RMSE}}(X,h)=\sqrt{\frac{1}{m}\sum_{i=1}^{m}{\left(h\left({x}_{i}\right)-{y}_{i}\right)}^{2}}$$
(4)
$${R}^{2}\left(X,h\right)=1-\frac{\sum_{i=1}^{m}{\left(h\left({x}_{i}\right)-{y}_{i}\right)}^{2}}{\sum_{i=1}^{m}{\left(h\left({x}_{i}\right)-\overline{y }\right)}^{2}}$$
(5)

In the formula, m is the samples number, \(h\left({x}_{i}\right)\) is the predicted value of the i-th sample, \({y}_{i}\) is the observed value of the i-th sample, and \(\overline{y }\) is the average value of the observation sample.

Results and discussion

Input selection

The HNPSFS is used to select different input features and form multiple training subsets to train the BPNN30 model. Figure 2 shows the distribution of the training results in a scatterplot. In the forward selection phase, graph a in Fig. 2 is the scatter diagram of a total of 15 input features for selecting \(z\) and \({u}_{\left(t-0*60 {\text{min}}\right)}\). Figure 2b is the scatter diagram of adding 10 input features of \({u}_{\left(t-1*60 {\text{min}}\right)}\) based on Fig. 2a. Figure 2c is the scatter diagram after adding 10 additional input features of \({u}_{\left(t-2*60 {\text{min}}\right)}\). The model prediction performance improves significantly as the time-lagged features are added to the input. The distribution of the training sample points is scattered on both sides of the trend line, gradually becoming more tightly clustered, and the distribution trend is consistent with the trend line. At this point, adding more input variables is no longer meaningful, and may make the model more complex. Next backward selection is employed resulting in the removal of sixteen irrelevant input variables, causing the retention of 19 significant input variables due to the HNPSFS method. Figure 2d shows the training results of the model. Compared with Fig. 2c, d shows an equivalent regression effect, but with fewer input variables and a simpler model.

Fig. 2
figure 2

Scatter plot of training model with different number of features. a 15; b 25; c 35; d 19

Figure 3 shows the intensity values \({W}_{IO}\) for the 35 feature variables derived during the HNPSFS process. From the value of\({W}_{IO}\), inputs with positive effects on flood depth prediction include: the invariant feature is the longitude of submerged area (\({L}_{O}\)), the latitude of the submerged area (\({L}_{a}\)). The effect of time-lagged features on the model output is dynamic during different time-lagged periods. The urban rainfall \(R,R\left(t-1\right), R(t-2)\) is selected as the input during the three time-lagged periods, indicating that the important impact of rainfall on the output is continuous. In two time-lagged periods, typhoon motion longitude (\({{\text{T}}}_{{\text{yLO}}}, {{\text{T}}}_{{\text{yLO}}}(t-1)\)), typhoon center pressure (\({{\text{T}}}_{yP}(t-1), {{\text{T}}}_{yP}(t-2)\)), typhoon center wind speed (\({{\text{T}}}_{yW}(t-1), {{\text{T}}}_{{\text{yW}}}(t-2)\)), urban tide of Zhuhai (\({{\text{T}}}_{iZ}(t-1),{{\text{T}}}_{iZ}(t-2)\)), flood depth of submerged area(\({D}_{(t)}, {D}_{(t-2)}\)) are chosen for input. In one time-lagged period, urban wind speed(\({W}_{S}\)), urban wind direction(\({W}_{d}(t-1)\)), urban tide of Macao (\({{\text{T}}}_{iM}\)) are chosen for input. The value of \({W}_{IO}\) of other features is negative, indicating that they have an inhibitory effect on model performance and cannot be selected for model input.

Fig. 3
figure 3

Intensity values for the 35 feature variables

Table 2 further presents the \({W}_{IO}\) values of the 19 significant input variables. Among them, the \({W}_{IO}\) value of \({D}_{(t-2)}\) is maximal, which is up to 1.502 and 3 to 22 times the value of other important input variables, indicating that flood depth of submerged area has the greatest influence on the predicted output. This result may also serve as a reminder to city managers that it is crucial to accurately and frequently collect data on changes in flooded areas in urban flood prediction scenarios. Urban weather and tidal during the typhoon event, as well as longitude and latitude of the submerged area, are more important than the optimal typhoon path. Meaning that the closer the typhoon path, such as the longitude and latitude of the moving path, is to the city, the more likely the typhoon directly affects the urban meteorology and offshore tides, and the greater the possibility of flood, which has an important impact on the model output.

Table 2 WIO value of the 19 significant input variables

Different from other input variable selection methods (such as correlation analysis, principal component analysis, information entropy calculation), the input selection is based on the performance of the artificial neural network. Thus, the HNPSFS method is used to help to identify important input parameters and reduce the number of input variables.

Structure selection

The choice of the number of neurons in the hidden layer \({{\text{n}}}_{H}\) is important for determining the neural network architecture. Too high of an \({n}_{H}\) value will increase the system's complexity. In reverse, too small of an \({n}_{H}\) value results in insufficient modeling capabilities of nonlinear systems and increases the network architecture uncertainty. \({n}_{H}\) is closely related to the model output and the input dimension. The 19 input variables are used together with the coupling method to select the best network architecture. The BPNN30 model is trained by a training set of 104,220 samples, where the \({{\text{n}}}_{H}\) values are between 1 and 80. Figure 4 shows the change in the MSE and epochs for the training dataset.

Fig. 4
figure 4

A comparison of the performance of the ANN model with increasing model complexity. a MSE; b epochs

In formula 3, \({n}_{H}=15\) is calculated. In Fig. 4a, when \({n}_{H}\le 10\), the MSE increases. When \({n}_{H}=22\), the MSE will peak. When \({n}_{H}>23\), the MSE of the dataset changes decreases. An increase in \({{\text{n}}}_{H}\) indicates an increase in the model’s complexity. This indicates that there is a best performance window when \(10\le {n}_{H}\le 21\). in Fig. 4b, when \({n}_{H}\le 5\), the model is the simplest, but the model also requires many iterations and a longer training time. When \({n}_{H}>5\), the epochs maintains low values. When \(10\le {n}_{H} \le 19\), there is an interval with small epochs value and gentle change. Therefore, the negative search is conducted with initial \({n}_{H}\) value as the starting point, the best network architecture is in the \(10\le {n}_{H}\le 15\) window. The results show that for a given input, output and \({n}_{H}\), when the smaller value of MSE and epochs (or training time) is found, the optimal parameters and network architecture can be objectively selected.

The number and quality of samples in the training dataset directly affect the model’s performance. It is another source of uncertainty in the network architecture. The degree of data quality is often unknown, and the network performance can only be tested and analyzed, either by splitting samples or extracting samples to build training datasets of different lengths. The flooding data used in this study are gathered in a record for each minute, and other input variables are also 1 record per minute after interpolation, forming a training set of 104,220 sample pairs. Data was extracted at 5-min, 15-min, 30-min, and 60-min intervals, resulting in five training subsets. The sampling data length is 104,220, 20,844, 6,948, 3,474, and 1,737, which corresponds to 100%, 20%, 6.67%, 3.33%, and 1.67% of the total sample size, respectively. Figure 5 shows the impact of the long and short training dataset on the BPNN30 model performance. When \({n}_{H}=15\) and the number of training samples is reduced from 100.00 to 20.00% of the total, the MSE of the model increases. When it is further reduced to 6.67% of the total, the MSE of the model decreases. The epochs show a decreasing trend and the network performance becomes better. As the number of training samples decreases further, the MSE and training time of the model increase, and the network performance becomes worse. Urban flooding is a time series problem, compared to the all sample data of 1 record per minute, the sample data drawn from 15-min intervals reduce redundant information and briefly describe the complex relationship of the flooding process. The sample data drawn from 30 to 60-min intervals lose a lot of details and are no longer good representatives of the data, which leads to poor performance in the data-driven model.

Fig. 5
figure 5

A comparison of the performance of the ANN model with using long and short training datasets. a MSE; b epochs

The ANN model training is not to obtain a suitable prediction network, but to select the best network architecture. Here the best network architecture is chosen as \({n}_{H}=15\) and the 15-min interval sampling subset is selected for the training dataset. In terms of MSE, the current model demonstrates a low value with minimal variability, indicating the model resides within the optimal performance window. As for the number of training epochs, the model exhibits low values, reflecting its simplicity and avoidance of extensive training durations. Additionally, increasing the amount of training data will not lead to significant improvements in the model's performance, further validating its adequacy for the given task. These attributes collectively contribute to the superiority of the current network. Compared with the trial-and-error method, the proposed method chooses to design the network architecture parameters based on the objective results, which can be used to reduce the uncertainty of the neural network architecture.

Network output

Based on the determined 19 inputs and best network architecture parameters, BPNN30, BPNN60, and BPNN120 models are trained and tested. Figure 6 shows the test results for the three networks. Compared with the observed samples, the prediction values of BPNN30 and BPNN60 accurately capture the peak flood occurrence. However, their prediction results for the flood peak above 1.5 m are too high and good time synchronization for both the flood rise and descent changes is presented. The prediction results of BPNN120 show multiple shocks and jumps. Its prediction for the peak flood level above 1.5 m is close to the actual value but with great delays or advances in time. It can be seen that the prediction output of the individual model is insufficient and can be compensated by using integration technology.

Fig. 6
figure 6

Prediction results of the BPNN models on the test set

The integration learning framework is centered on the meticulous selection and seamless integration of multiple base learners, forming a close alliance with classical model selection theory. The methodologies utilized for model selection include cross-validation, grid search, and model evaluation. Typically, the quest is to identify an aggregated model whose generalization error significantly outperforms any standalone model. In our investigation, BPNN30 emerged as the preeminent individual model. Therefore, we embarked on a comparative analysis of evaluation indicators, pitting the three integrated models against BPNN30 as the yardstick for selecting the optimal integrated model. This process ensures that we select the most effective and robust ensemble model, tailored to achieve optimal performance. Table 3 presents the error statistics of the BPNN30 model and EANN models on the test sets. R2 and RMSE is used to evaluate the model performance. Specifically, the SAE model has the lowest fit, with an R2 value of 0.93969. The WAE and NNE models have a higher fit, approximately 2% higher than the BPNN30 model. All three integrating models obtain smaller RMSE values than the BPNN30 model. The SAE model decreases by 10%, the WAE model by 101%, and the NNE model by 120%. The integrating model reduces the variance of individual models and yields better prediction performance.

Table 3 Error statistic of the models on the test set

Figure 7 shows the regression results of the BPNN30 model and EANN models on the test set. It can be seen that the distribution trend of the four models’ scatter is consistent with the trend line. The sample of the BPNN30 model mostly falls to the left of the trend line, meaning the majority of predictions are higher than the expected value. The distribution of sample points in the SAE model becomes better. After the observed value is greater than 1.5 m, the prediction sample points distribute more scattered and the prediction value is higher than the observed value. Sample points are more uniformly clustered on the trend line for the WAE model and the NNE model, indicating that the predicted values are closest to the observed values, meaning that the regression is more accurate.

Fig. 7
figure 7

Result of multivariate regression analysis of the BPNN30 model and the ensemble models. a BPNN30 model; b SAE model; c WAE model; d NNE model

Figure 8 shows the test results of flood prediction thirty minutes in advance for the BPNN30 model and the EANN models. Compared with the BPNN30 model, the flood peak predicted by the SAE model has smaller errors, but the hopping phenomenon of flood timing during flooding was not improved. The WAE and the NNE model have close performance and perform excellent predictions on flooding depth and time synchronization. In the nine inundation areas, the WAE and the NNE model predict the flooding process more smoothly and continuously, especially for peaks in flooding. This indicates the reliability of the output of the ensemble neural network model. In areas three and four, the rising edge of the predicted flooding is advanced. In area eight, they are slightly lower predictions for flood peaks than the observed values. The reasons behind these differences need to be further explored to be better understood.

Fig. 8
figure 8

A comparison of predicted and observed flood depth values. a BPNN30 model; b SAE model; c WAE model; d NNE model

Conclusion

Artificial neural networks are widely used in flood prediction, and the adverse effects of uncertainty sources on the model need to be improved. This paper studies the application of the EANN model to improve the certainty of flood prediction in coastal cities, proposes the HNPSFS method to select appropriate inputs, uses the coupling method to select network architecture and parameters, uses nonlinear neural integration technology to capture the uncertainty of output, and realizes the objective method based on the neural network itself to reduce model uncertainty and improve model prediction performance. The model achieves accurate results in the early prediction of floods in the Macao region of China. The prediction output can provide effective guidance for city managers to issue flood warnings. It also has been found that ANN modeling can utilize shore training datasets sampled at appropriate time intervals and that they can have similar or better performance compared to long training dataset models.