1 Introduction

With the rapid expansion of cities worldwide, urban development is facing an increasing number of problems. The idea of a smart city is considered an effective solution, which can solve management problems, guarantee citizens’ quality of life and support sustainable development [1]. The Internet of Things (IoT) and big data analysis technology are generally regarded as the main elements with which to implement smart city services, which has raised global interest [2, 3]. The IoT can integrate various technologies, such as sensor networks, big data analysis and cloud computing. It has realized many functions, such as connecting intelligent devices, perceiving the external environment, collecting data, formulating results and effectively communicating with human beings or other digital devices [4]. These technologies have been widely used in various areas, such as smart cities, intelligent medical treatment, intelligent early warning systems and disaster management [5], which have promoted intelligent service innovation in different sectors so as to build technology-driven cities [6].

Flood risk is becoming a major urban challenge. In particular, coastal cities are much more easily affected by the potential impact of climate change, and floods are becoming increasingly frequent and serious [7]. Coastal cities are close to the ocean and are on low-lying terrain. Large-scale events with high return periods, such as typhoons or tsunamis, and events with low return periods, such as heavy precipitation, may cause flooding [8, 9]. All countries attach great importance to flood control and prevention. In actual urban flood management, accurate forecasting of flood depth is very important [10], and can help city managers to rapidly issue flood warnings and deploy flood control measures in advance [11, 12]. In cities with smart rainwater systems, accurate flood forecasting can help managers to choose the best course of action by which to reduce their drainage system pressure, which could be instrumental for outlining a long-term strategy of flood prevention for smart cities [13].

Currently, the IoT is widely used in flood monitoring, forecasting and management. As a part of an intelligent flood forecasting system, it helps to forecast and warn people about potential floods [14]. Various intelligent sensors related to hydrological information technology have been embedded in different urban areas, including sensors for monitoring changes in flood water levels, such as depth meters [15], hydrological monitoring cameras [16], smart water level gauges [17] and the combination of various sensors, such as hydro-meteorological sensor nets [18] and automatic weather stations [19], which collect huge amounts of data. The recent growth of flood-related data, the development of intelligent algorithms and the further application in hydrology have led to the rapid development of data-driven hydrological forecasting, including machine learning (ML), deep learning (DL) and computing infrastructure [17, 20]. However, under non-stationary conditions, the flood inundation process is highly complex and uncertain, so there are still difficulties and challenges in accurate flood forecasting [21, 22]. The ensemble modeling method has been proven to improve the accuracy of hydrological models [23]. In this paper we study the application of ensemble learning technology to forecasting the coastal flooding in IoT-enabled smart cities in advance. This study does not involve technology for drawing flood maps, which is applied in urban disaster early warning systems [24]. More specifically, the contributions of this study are as follows: (i) to discuss obtaining data from the IoT system of flood monitoring to provide basic data for research; (ii) to discuss technology used in previous research and the complexity and uncontrollability of urban flood forecasting; (iii) to further discuss the application of the linear regression (LR), support vector machine (SVM), the backpropagation neural network (BPNN) and random forest (RF) models to predict coastal urban flooding; (iv) to adopt the Bayesian model combination (BMC) strategy to integrate the predicted values of the BPNN and RF models to build an ensemble model that is more reliable than a single one; (v) innovative application of the intensity level of flood depth: (1) a sampling process combining flood intensity classification and cross-validation is proposed to solve the problem of data imbalance; (2) statistics of the correct proportions of predictions in different flood intensity levels, and use it to evaluate the reliability of the model.

The remainder of this article is organized as follows. Section 2 provides a discussion on work related to urban flood forecasting. Section 3 describes the methods for flood forecasting, including system descriptions, methodology and ensemble learning technology. Section 4 describes the research area, the data set structure, the training subset sampling, the set parameters, the conducted experiments and the main results. Section 5 provides a conclusion and discusses future follow-up work.

2 Related Work

Urban flood forecasting can use physically based 1D, 2D and 1D–2D models [25,26,27,28]. It combines topographic parameters and hydrological monitoring data and other variables to build models for different physical environments, which requires a high computing cost. However, data-driven models do not need to simulate the physical process [29], but use ML technology to learn the data and map the model input to the output. Thus, input flood factor data to the model, the model learns the data, and then outputs the flood depth value. The development of urban flood forecasting models is more flexible, and the computing time is shorter. Some studies have used logistic regression for flood forecasting. In [30], the authors establish a LR model to forecast urban composite floods (high tide and high precipitation) and the sewer overflow caused by them. In [31], the authors use the LR model to identify the determinants of urban flood risk perception and guide managers in the design of flood risk management plans. Neural networks are widely used as a powerful method of nonlinear system modeling. For instance, an integrated neural network model is proposed in [32], which can realize real-time urban flood forecasting and provide a maximum water-level map within a few seconds. In [33], compared with BPNN and Elman neural network experiments, recurrent neural network forecasting performs better, and realizes multi-step urban flood forecasting 10–60 min in advance. In [34], the combination of NN and K-means clustering is proposed; in [35], the combination of BPNN and principal component analysis methods is proposed; and in [36], the combination of a dynamic neural network and a self-organizing map is proposed. These three hybrid network models can predict urban inundation depth 1–3 h in advance. For urban disaster prevention, a flood forecasting model with a longer forecast period may be more useful. In [37], a two-stage inundation forecasting method based on a SVM, which realizes prediction 1–6 h in advance during typhoons, is proposed. In [38], a weakly labeled SVM model for assessing urban flood sustainability is proposed, which shows a better performance than the LR and SVM models, and can be used to identify flood-prone areas with limited flood inventories. The DL model can capture complex patterns hidden in data sets and obtain a more accurate prediction. In [39], the authors suggest that flood forecasting accuracy can be improved by a long short-term memory network to correct the average precipitation forecast. In [10], a regression model is constructed for the depth prediction of urban flood areas with a gradient-boosting decision tree (GBDT) algorithm to forecast the flood depth of different precipitation return periods. In [40], an urban flood monitoring and forecasting platform is designed, which adopts hybrid DL and fuzzy logic technology to accurately predict the flood risk at street level. Although data-driven models have been widely used in hydrology, there are still relatively few studies using this method to simulate floods in coastal cities [41].

3 Method Description

3.1 System Description

An IoT system for flood monitoring was developed and applied, which overcomes the limitations of data collection, reduces the time complexity of real-time collection and improves the efficiency of hydrological information sharing and processing. As shown in Fig. 1, a basic framework based on the IoT is used for flood informatization, including application software, data services and infrastructure, which includes a variety of sensors, workstations and IoT devices. Among them, the weather station sensor can collect weather data, such as urban precipitation, wind direction, wind speed, humidity and temperature. The hydrological station sensor can collect hydrological data, such as flood depth, wave height, tide height and inundation images. A low-cost water level gauge with a wireless sensor is easy to deploy in any street to collect inundation data. These data are transmitted to IoT cloud storage through the gateway and provide data services to the community and the public. The data collected by the IoT are integrated with other terrain and typhoon path data to form a flood data set, so as to help researchers build models with higher accuracy and realize urban flood simulation and forecasting [42,43,44].

Fig. 1
figure 1

Structure of IoT system for flood forecasting

3.2 Methodology

This research is devoted to the services of flood forecasting in the IoT system, and its purpose is to develop an ML method to predict the flood depth of coastal cities. This requires the following steps: (1) collect and mark data, and consider the diversity when collecting data (collect data at different places and accurate times that reflect the flood process); (2) analyze and pre-process the data set; the training results of the ML model are highly dependent on the data quality, but the data set is inherently unbalanced, and samples of input models require diversity and uniform sampling; (3) apply a set of ML models to urban coastal flood forecasting that is considered suitable for capturing the nonlinear flood relationship; (4) design a Bayesian model combination-based ensemble learning method (BMC-EL), verify the effectiveness of the method through experiments; (5) use the mean squared error (MSE), the mean absolute error (MAE), the root-mean-square error (RMSE) and the coefficient of determination (R2) and calculate the accuracy ratios at different flood intensity levels. As far as we know, the above-mentioned methods and statistical analyses have not previously been applied to flood forecasting in coastal cities.

3.3 Ensemble Learning Technology

In research for flood inundation depth estimation, the reliability of the results is as important as accuracy. The reason for this is that the error of forecasting higher than the actual situation can easily cause public panic and excessive prevention measures. Conversely, the error of forecasting lower than the actual situation can easily cause citizens to relax vigilance and neglect disaster preparedness. Forecasting using a single ML method such as RL, ANN and SVM algorithms easily causes difficulty in balancing the relationship between the training set error and the generalization error, and can easily cause overfitting or underfitting [45]. Ensemble learning can improve reliability. It combines multiple base learners, often achieving more significant generalization performance and reliability than a single one [46]. In this study we employed two widely used BPNN and RF models as base learners and integrated the results with BMC to form a strong learner. This is because both the BPNN and RF are unstable base learners, and when the training samples change slightly, the results are significantly different. Therefore, when there are subtle changes in the flood factor data, these changes are keenly caught by the BPNN and RF models, and directly affect the prediction results.

The flow chart of the integrated prediction method based on BMC is shown in Fig. 2.

Fig. 2
figure 2

The structure chart of BMC-EL model

The specific steps are as follows:

  1. (1)

    Introduce the classification of flood intensity levels, perform classification on the training set and generate six categories \(\{{C}_{1},{C}_{2},{C}_{3},{C}_{4},{C}_{5},{C}_{6}\}\).

  2. (2)

    Perform tenfold cross-validation for each class \({C}_{n}\), and generate 10 training subsets \(\{{D}_{1},{D}_{2},\ldots ,{D}_{10}\}\) in turn. First, class \({C}_{1}\) is cross-validated by tenfold, and 10 bags \(\{{b}_{1},{b}_{2},\ldots ,{b}_{10}\}\) are generated randomly. \(\{{b}_{2},{b}_{3},\ldots ,{b}_{10}\}\) is imported into training subset \({D}_{1}\), \(\{{b}_{1},{b}_{3},\ldots ,{b}_{10}\}\) is imported into \({D}_{2}\) and then nine different bags are imported into the corresponding training subsets in turn until \(\{{b}_{1},{b}_{3},\ldots ,{b}_{9}\}\) is imported into \({D}_{10}\). Similarly, class \(\{{C}_{1},{C}_{2},\ldots ,{C}_{6}\}\) is also tenfold cross-validated, and then nine bags are imported into the training subsets \(\{{D}_{1},{D}_{2},\ldots ,{D}_{10}\},\) respectively.

  3. (3)

    Use multiple BPNNs to build a regression model and train 10 BPNN algorithms with 10 training subsets; use multiple classification and regression trees (CARTs) to build random forests and train 10 RF algorithms with 10 training subsets.

  4. (4)

    Import the 10 trained groups of BPNNs and 10 groups of RFs into the validation set and output 20 groups of predictive output values \(({y}_{1},{y}_{2},\ldots ,{y}_{20})\). Supposing that the real output value of the validation set is \(y\), the matrix \((\mathrm{y},{y}_{1},{y}_{2},\ldots ,{y}_{20})\) is built and imported into the BMC. Strategies are combined according to the predictive performance output model of 20 groups of algorithms in the validation set.

  5. (5)

    Import the 10 trained groups of BPNNs and 10 trained groups of RFs into the testing set, and each output their predicted values \(({y}_{1}{^\prime},{y}_{2}{^\prime},\ldots ,{y}_{20}{^\prime})\) and obtain the final predicted values of flood depth according to the model integration strategy of BMC.

3.3.1 BPNN Base Learner

Multi-layer perceptron (MLP) is a feed-forward neural network composed of multiple perceptron layers. Here, the BPNN is the most widely used neural network, which performs well in nonlinear problems and is particularly suitable for hydrological forecasting. It belongs to the MLP in structure and uses a backpropagation (BP) algorithm to train the network, usually combined with optimization methods (such as gradient descent). A typical BPNN is generally a three-layer static network; namely, the input layer, the hidden layer and the output layer. Each layer consists of several elements, including neurons, weights and biases of linked neurons. In this study, based on a series of exploratory experiments, the optimal model structure of the BPNN was designed as shown in Fig. 3. The BPNN model consists of an input layer (with 34 input variables; specific input variables shown in Table 2), an output layer (with one output variable; namely, the flood depth) and a hidden layer (with 69 neurons, which is 2 times plus 1 of that of the neurons in the input layer). The model output D(t + 1) is the predicted flood depth after 1 h, and the backpropagation algorithm is used to train the BPNN. Here, the training time of a single BPNN model exceeds 220 min, the training algorithms for the adaptation of the weights needs to be optimized with respect to the shorter computation time. The location transfer function of the hidden layer is S-type, and the transfer function of the output layer is linear [33].

Fig. 3
figure 3

The structure chart of the BPNN model

3.3.2 RF Base Learner

Another base learner is RF based on CART. The pruning operation in RF can effectively reduce the risk of overfitting, which is simple and efficient, easy to implement and has low computing overhead. It shows strong performance in many classification regression problems. The input variables of the RF model are 34 flood factors, and only one output variable is the flood depth. The definition of these variables is the same as that of the BPNN model. As shown in Fig. 4, the RF algorithm flow includes:

  1. (1)

    Use the bootstrap method to select n samples from the training set as a sub-sample set.

  2. (2)

    Use the sub-sample set to generate a CART. When generating each node, multiple features are randomly selected without repetition, and these features are used to divide the sub-sample set. In addition, the Gini coefficient, gain rate or information gain judgment are used to find the best division feature.

  3. (3)

    Repeat steps 1 and 2 a total of k times; k is the CART number of the RF.

  4. (4)

    Use the trained RF to predict the test sample; the average CART predicted value is taken as the result.

Fig. 4
figure 4

Random forest algorithm flow

Due to the attribute perturbation of the RF algorithm itself, it converges to a lower generalization error only when the number of CARTs reaches a certain level. The prediction samples are imported into a single RF algorithm, and the relationship between the out-of-bag data (OOB) error and the number of regression trees is shown in Fig. 5. When the number of CARTs reaches 40, the OOB error tends to converge. Thus, we set the number k of CARTs to 40. For the given problem, the training time of a single RF model takes 3 min.

Fig. 5
figure 5

Error estimation of random forest regression model

3.3.3 BMC

Ensemble learning can improve the reliability of prediction methods by combination. The Bayesian model is regarded as a standard method of ensemble learning combined with a base learner. BMA takes the posterior probability of the model’s prediction accuracy on the verification set as the model weight, and assigns reasonable weights to multiple base learner models. These models are combined to reduce risk and avoid the high risk caused by the uncertainty and singularity of a single model. The given flood depth dataset D which contains multiple samples \({d}_{i}\); the predicted value of flood depth output by the base learner is \({x}_{i}\); the true value of flood depth is \({y}_{i}\); \({x}_{i}\) and \({y}_{i}\) constitute sample \({d}_{i}\). The model space \(H\) is approximated by a finite number of individual hypotheses, and \(h\) is the individual hypothesis of the model space. Then, the posterior distribution of \({y}_{i}\) is as follows:

$$p\left({y}_{i}|{x}_{i},D,H\right)=\sum _{h\in H}p\left({y}_{i}|{x}_{i},h\right)p\left(h|D\right).$$
(1)

In the formula, \(p\left({y}_{i}|{x}_{i},D,H\right)\) is the weighted average of the posterior distribution estimated by all individual hypotheses; \(p\left({y}_{i}|{x}_{i},h\right)\) is the predicted distribution of \({y}_{i}\) in the hypothesis space \(h\); \(p\left(h|D\right)\) is the posterior probability of individual hypothesis \(h\) under data set \(D\). However, BMA using the computing method of integral likelihood estimation can easily cause overfitting and poor performance [47]. To benefit from the inherent advantages of ensemble learning more efficiently in flood forecasting experiments, the combination strategy should focus on reflecting the complementary advantages of each hypothesis space, instead of only finding the optimal hypothesis through BMA. To address the above problems, the hypothesis space E is added to BMA to establish BMC, and formula (1) is changed into (2):

$$p\left({y}_{i}|{x}_{i},D,H,E\right)=\sum _{e\in E}p\left({y}_{i}|{x}_{i},H,e\right)p(e|D).$$
(2)

In the formula, \(e\) is the individual hypothesis model in the combined model space \(E\). This correction overcomes the BMA’s tendency to provide individuals with an assumption of the entire weight of \(h\) [48]. The diagram of BMA and BMC is shown in Fig. 6.

Fig. 6
figure 6

Ensemble learning method: a Bayesian model averaging, b Bayesian model combination

4 Experiments

4.1 Research Area and Flood Data

Considering the vulnerability to flooding and the availability of flood record data, Macao, a coastal city in China, as shown in Fig. 7a, is an ideal research area for this study. Located at the estuary of the Pearl River, the largest river in Southern China, Macao is among the most vulnerable cities to coastal floods in China. When a tropical cyclone in the Northwest Pacific approaches, typhoons bring heavy precipitation and storm surges, which often overlap with astronomical tides and cause seawater to inundate urban areas [49]. As an international city and a special administrative region of “one country, two systems” in China, flood risk in this region is very important to China’s economy and politics.

Fig. 7
figure 7

IoT equipment for flood monitoring in Macao: a location of the study area, b smart water level sensors, and c water level monitoring station

Macao is small in area (32.9 square kilometers), densely distributed with buildings and streets, has a high rate of ground hardening. Due to long-term flooding, Macao has built a complete drainage infrastructure, including a huge number of sewers and drainage pumping stations. In some streets that are often flooded, the Macao Meteorological Bureau has set up 17 flood monitoring points, and this number is still increasing. Before 2020, smart water-level gauges (Fig. 7b) including wireless sensors were installed and used. The gauges collect one water-level data every minute and transmit it to the monitoring center of the Meteorological Bureau through a wireless network [50]. After 2020, a water level monitoring station (Fig. 7c) was then installed and used. This can collect flood depth, air humidity, temperature, precipitation and wind values in flooded areas, and has a high-definition camera to record street images. Owing to IoT technology, the amount of flood-related data and information collected, transmitted and stored has increased, providing richer research data for flood warning in Macao.

The equipment continuously collects data relating to flood depth and flood factors, forming a flood data set of ten million levels. Through the analysis of these data and the flood inundation records released by the Macao Meteorological Bureau, it can clearly be seen that the floods in Macao occur from August to October every year, and are closely related to the typhoon events within 800 km of Macao. During seven typhoon events, a total of 12,900 min of flood depth data filtered out manually was selected for this study. The flood depth data for most blocks in Macao for this period were completely and clearly collected by water level sensors, with high data quality and good usability. Table 1 shows the relevant information for seven typhoons from 2016 to 2020; the most serious one was 1713 HATO, with the maximum submergence depth of 3.47 m. The lowest submergence was 1720 KUANUM, with a maximum submergence depth of 0.05 m. In this paper, the flood data for 6 typhoons were selected as the training set. 1822 MANGKHUT was used as the test set, because its maximum submergence depth was 2.09 m, and its difference from 3.47 m is the smallest comparing it with 1713 HATO, which can better test the forecasting ability of models for deeper floods. The test set data were only used to test the model performance instead of for model training.

Table 1 Typhoon case of model training and testing

Urban coastal floods are affected by different factors, such as the environment, geography and human factors. Environmental factors include typhoons, waves, tides, precipitation, wind and groundwater levels. Geographical factors include urban geographical elevation, coastal distance, soil characteristics and landform. Human factors include surface hardening conditions and rainwater infrastructure. The complex causes of floods in coastal cities make flood modeling and prediction difficult. Data-driven models do not need to simulate physical processes, but an understanding of these processes can influence the selection of model input variables [51]. Based on our understanding of the physical flood process, we selected flood factors for the optimal typhoon path (typhoon intensity level, typhoon center longitude and latitude, typhoon center pressure and wind speed), which was taken from the tropical cyclone data center of China Meteorological Administration, and the data interval was 3 or 6 h; the urban weather (precipitation, wind speed and direction) and coastal tides on the west and north sides of the city were collected by IoT devices, and the data interval was 1 h. The urban terrain data correlate with the geographic elevation of nine inundated blocks, provided by the Macao Meteorological Bureau. The flood depth of inundated blocks was also used as an input because it reflects the urban surface runoff and infrastructure drainage capacity. Table 2 shows the flood factor information as the input of the model, and its range values are the minimum and maximum values obtained for the seven typhoons. Through interpolation processing, the flood factor data interval was 1 min, which had the same time interval as the flood depth data, and was normalized to [− 1 1].

Table 2 A summary of the flood factors used for this research

4.2 Diversity of Training Subsets

The performance of the ML model is highly dependent on the learned data [52]. Flood depth is different in different typhoons, and flood factor data also present differences. However, traditional random sampling leads to the uneven distribution of samples in training subsets. For addressing the imbalance in the dataset, flood intensity classification was introduced to classify the training set, and then K-fold cross-validation was performed to increase the diversity of the training subsets.

As shown in Table 3, according to the classification of flood intensity by the Macao Meteorological Bureau, when the flood depth above the road surface is < 0.1 m, the flood level of the submerged area is at the 0 level, and the manager does not need to issue a warning. When the flood depth above the road surface is < 0.5 m, 1.0 m, 1.5 m, 2.5 m and more than 2.5 m, the flood level is 1–5, and the manager needs to issue blue, yellow, orange, red and black warnings. The higher the warning level, the higher the flood depth, and the more serious the disaster may be.

Table 3 Classification of flood intensity in Macao

For the integrated model, the difference between the base learners used for combination is relatively large, and the ensemble learning has a better performance. The increase in the diversity of training subsets can improve the differences in the input samples of the base learner. According to Table 3, the flood data in the training set are classified based on intensity levels. There are 104,220 flood depth data in the nine submerged areas, which have been divided into six categories with uneven distribution, as shown in Table 4. There are 102,041 0-level data; accounting for 97.91%; 1648 1-level data, accounting for 1.58%; and 531 2-level and above data, accounting for 0.51%.

Table 4 Distribution of flood intensity in nine submerged areas

According to the algorithm steps of BMC-EL, 10 training subsets should be generated. The first step is to introduce the classification of flood intensity, and the training set is divided into six categories \(\{{C}_{1},{C}_{2},\ldots ,{C}_{6}\}\). The second step is to perform tenfold cross-validation for each category \({C}_{n}\), and generate 10 training subsets \(\{{D}_{1},{D}_{2},\ldots ,{D}_{10}\}\) in turn.

The original flood factor data were normalized, and a 3D coordinate system was established by selecting the distance between the typhoon center and meteorological station, the urban wind speed and the western urban tide. The samples distribution of the training set in the 3D coordinate system is shown in Fig. 8a. The samples distribution of one of the training subsets is shown in Fig. 8b, and the samples distribution of corresponding validation subset is shown in Fig. 8c.

Fig. 8
figure 8

Distribution of different samples: a training set, b training subset, and c validation subset

Classification prior to cross-validation can make each training subset to contain flood factor data corresponding to flood intensity, which ensures uniform sampling and increases diversity. Since tenfold cross-validation, the sample size of the training subset is 90% of the training set, and the sample size of validation subset is 10% of the training set. Figure 8 indications that the sampling process does not affect the distribution regularity of flood factors.

4.3 Parameter Setting

All simulations were done on the same desktop PC (CPU: i5 3.2 GHz, RAM: 32 GB). Parameter setting and model training were completed with the help of the ML and DL toolboxes of MATLAB 2019b. Various ML methods can be easily implemented with the help of toolboxes and built-in functions. In this study, six different ML models were used for comparative experiments, and the ML adjustment mainly depends on the author’s exploratory experiment. The functions and parameter values used are shown in Table 5. For BMC-EL, the parameters of its two base learners are the same as those of the single BPNN or RF model above.

Table 5 Parameter setting for ML models

4.4 Results and Discussion

4.4.1 Model Performance

The experiments included model training and testing. Single LR, LSVM, QSVM, RF and BPNN models were trained with the complete training set, while the BMC-EL model was trained with the 10 above mentioned training subsets. The test set uses flood data from Typhoon 1822 MANGKHUT. They are independent data and have not been used in the previous training stage. The MSE, RMSE, MAE and coefficient of determination (R2) were the measures used to evaluate the accuracy of the developed models.

Table 6 shows the test performance of the developed ML model. The R2 values of the six models are all > 0.99, close to 1, indicating a good fitting effect. These models are considered suitable for capturing the nonlinear relationship of floods. The best prediction model is BMC-EL, the MSE value is 0.0072, the MAE value is 528.40 and the RMSE value is 0.0847, which are obviously smaller than other models. The next model with better performance is RF, with an MSE value of 0.0247, an MAE value of 976.47 and an RMSE value of 0.1571. The performance of the two integrated models is obviously better than that of the other four, which is consistent with the proved results; that is, the integrated model has the ability to improve the accuracy of hydrological models [23]. The BPNN is not performing well, its R2 value is 0.9870, its MSE value is 0.1151, its MAE value is 2071.19 and its RMSE value is 0.3392, which are 16, 4 and 4 times higher than those of BMC-EL, respectively. In this study, the BPNN with 34 input variables requires long-term training, and it is difficult to adjust its hyperparameters. Thus, the next step must use optimization algorithms to improve the performance. The regression performance of QSVM (MSE = 0.1178; MAE = 1941.94; RMSE = 0.3432) is not significantly different from that of LSVM (MSE = 0.1593; MAE = 2627.34; RMSE = 0.3991) and LR (MSE = 0.1597; MAE = 2258.21; RMSE = 0.3996).

Table 6 Performance index of the different ML models

The scatter plot of flood prediction for these models is shown in Fig. 9. The scatter distribution also shows that the BMC-EL performs better than the other models. The distribution trend of sample data is consistent with the diagonal line, and the deviation of the sample plot distribution from the diagonal line is smaller. Second, the distribution of the RF sample data is relatively scattered; a small amount is near the diagonal line, but most is above it. The sample data for the other four models, BPNN, QSWM, LSVM and LR, have good distribution clustering, but the distribution trend is generally above the diagonal line. This shows that the predicted value of their output is generally higher than the actual observation value. The cause of it is some future information which is actually unknown at the present moment is used in their modeling processes [53, 54]. The models have poor extrapolation performance and is not suitable for practical applications. Thus, concerning accuracy, BMC-EL is the best, and the evaluation value confirms its ability to express the changes in the flood depth of typhoons.

Fig. 9
figure 9

Scatter plots of flood prediction: a LR, b LSVM, c QSVM, d BPNN, e RF and f BMC-EL

4.4.2 Prediction Results

The predicted and observed values of flood depth in nine submerged areas in Macao are plotted in Fig. 10. It can clearly be seen that predicted value outputs of BMC-EL have the same amplitude value change as the observed values, which greatly reduces the problem of the time delay at the peak, and effectively forecasts the flood depth 60 min in advance. The predicted values of other models are higher than the observed values and have a certain degree of delay. Here, the BPNN and LSVM show flood peaks ahead of the actual time in areas 3 (Fig. 10c) and 4 (Fig. 10d).

Fig. 10
figure 10

The comparison of predicted and observed flood depth value: a area 1, b area 2, c area 3, d area 4, e area 5, f area 6, g area 7, h area 8 and i area 9

BMC-EL performs well in flood forecasting in different areas in Macao. Concerning the topographic (Fig. 7. IOT equipment for flood monitoring in Macao: (a) location of the study area), areas 1, 2, 5, 6, 8 and 9 are geographically closer to rivers joining the sea, and it is most likely that the high-level water caused by the storm surge may cross the coastal protective dam and flood these areas. Areas 3, 4 and 7 are closer to the inland in Macao, flooding is most likely due to the large road runoff and insufficient drainage capacity of the drainage network during heavy precipitation. The geographic elevation of areas 7 and 8 is higher than that of the other seven areas, so they are less likely to be flooded, while the vulnerability of the other seven areas to flooding is higher. BMC-EL achieved high accuracy prediction in each submerged area, showing good robustness and extrapolation ability. For model domains of smaller spatial extent in Macao [55], the computation time of prediction based the trained BMC-EL model can be only 15 s.

4.4.3 Prediction Error

The prediction error of each model in nine submerged areas is shown in Fig. 11. The error reaches the maximum when the flood peak appears. The maximum prediction error of EMC-EL is 0.3 m, followed by RF with 0.8 m. The maximum prediction errors of the BPNN, QSVM, LSVM and LR are 1.5–1.7 m. The areas with the largest prediction errors are 5 and 6, which are the most affected areas in Macao. As shown in Table 4, there are 68 records with a flood depth over 1 m, 97 records with a flood depth over 1.5 m and 46 records with a flood depth over 2.5 m. There are too few training samples with a high flood depth. Another possible reason for this is that the flooding in these two depressions is mainly caused by storm surges crossing the dam, and the model does not map the nonlinear relationship between them.

Fig. 11
figure 11

The comparison of flood prediction error in nine areas

4.4.4 Prediction Reliability

Another important requirement for flood prediction is the reliability of models [56, 57]. Flood level warnings are more likely to attract public attention than accurate flood depth. As shown in Table 7, the predicted flood depth values are classified by levels, and their accuracy ratios at levels 0–5 are counted. The higher the accurate ratio, the higher the reliability of the model. According to the definition here, BMC-EL has the highest overall prediction reliability of 92.15% among the six models. RF provides a good result with a total reliability of 83.72%. The total reliability of the other four models is not significantly different. The LR is 74.66%, the QSVM is 76.27%, the BPNN is 72.43% and the smallest is LSVM at 67.20%.

Table 7 Reliability of ML models in various flood intensity levels

Although the total reliability of the six ML models is higher than 67.20%, this does not mean that flood warnings can be issued based on their prediction results. It must be ensured that the prediction of each flood intensity level has high reliability, especially at level 4 (red warnings) or 5 (black warnings). During the 1822 MANGKHUT event, there was no flood that was more than 2.5 m higher than the road surface in Macao, the observed value has no 5-level data. The predicted value is consistent with the actual observation value, and there is no 5-level data. Here, all six models have 100% reliability at level 5 but there is a lack of confidence, the current test set lacks samples, which limits the performance test of models at level 5. The models have nearly 90% reliability at level 0 without any warnings, and have lower reliability when the flood intensity level increases with warnings. The reliability of the LR, LSVM, QSVM and BPNN at levels 2, 3 and 4 is lower than 34.26%, while that of RF at levels 2 and 3 is only 37.50% and 50.29%, respectively. Thus, these five models are considered unreliable and cannot be used for flood warning. As shown in Fig. 12, BMC-EL provides a very good result, with the lowest reliability of 65.31% at level 2, and the reliability of other levels above 80%, which is very stable and can be used to guide flood warnings.

Fig. 12
figure 12

The comparison of ML models’ reliability

4.4.5 Ablation Experiment

In this study, we designed an ensemble approach based on BMC to enhance the reliability of the predictions. For further validation instructions, an ablation experiment was conducted [58].

In the experiment, the BPNN and RF networks together are called the base learner layer (BLL). We set up two sets of model comparisons. The first group is the original BLL, which removes the BMC, and uses the simple average method to combine the prediction output. The other group uses BMC to combine the prediction output, and the other settings of the model are unchanged. The experimental results are shown in Table 8. The prediction reliability of the BMC ensemble method is significantly higher than that of the original BLL, the total reliability increases by 16.85%. Especially, the reliability increases by 47.50% at level 2 and by 69.36% at level 3. The model based the BMC strategy has 92.15% total reliability. From level 0 to level 5, the stability is 98.12%, 90.49%, 65.31%, 80.49%, 91.01% and 100%, respectively. Level 2 reliability is lowest 65.31% and it is higher than 60%, which can be trusted, the damage to the city caused by level 2 floods is relatively weak. Experiments show that the BMC strategy effectively improves the reliability of the ensemble learning model for flood forecasting.

Table 8 Ablation experiment results

5 Conclusions

With the development of the IoT technology, cities are becoming increasingly intelligent. The extensive use of the IOT equipment has realized the collection of big data on floods and flood factors, which provides data support for the application of ML technology to solve the problem of flood forecasting. By studying how to apply EL technology to improve the accuracy and reliability of urban coastal flood forecasting, our research makes a contribution to this field. To achieve this goal, an EL method based on BMC, BPNN and RF models was proposed in our system, and the diversity and uniformity of the training set were increased by flood intensity classification and K-fold cross-validation. Then, the reliability of the models was evaluated by the accurate ratio of different flood intensity levels.

During typhoons, design data were collected from nine different areas in Macao to train and test BMC-EL, LR, LSVM, QSVM, BPNN and RF models and adopt MSE, MAE, RMSE and R2 to evaluate the performance of the models. Concerning the prediction accuracy of the six models, BMC-EL has the best prediction results, the second highest prediction results were found for RF. These two integrated models perform better than the LR, LSVM, QSVM and BPNN. Concerning the prediction reliability of the six models, BMC-EL has the best reliability, the lowest reliability is 65.31% at level 2, and its total reliability is 92.15%. The reliability of the ensemble learning model used BMC strategy is increased by 16.85% than that of the original BLL. The reliability of the BMC-EL model is much better than other models, and the prediction output can provide effective guidance for city managers to issue flood warnings.

Future work will use a richer data set to organize and develop models so as to forecast street flooding with higher accuracy. With the development of information technology, data for flooding and flood factors can be collected in richer forms. For example, wireless sensors will be installed on streets to collect flood data, citizens will generate flood reports from smart-phone apps, and water-level monitoring stations will take more storm surge flooding pictures. These multi-source data can be used to further enrich the prediction of models. More ML and DL models should be studied in combination with hydrophysical models, so as to develop more accurate coastal flood prediction models, increase the lead time, improve the reliability and enhance the degree of urban intelligence.