1 Introduction

The changes in precipitation patterns expected for the following decades (Hurk et al. 2006; Bates et al. 2008; Murphy et al. 2009; Romero et al. 2011), as well as urban growth, and higher population and assets densities, increase the risks of urban pluvial flooding (Ashley et al. 2005; Veldhuis and Clemens 2009). This kind of floods can give rise to considerable damage in cities. Estimated damage due to heavy rain in autumn 1998 in the Netherlands accounted for 408 million Euros (Jak and Kok 2000; European Central Bank 1998). Likewise, in the UK the annual average damage from intra-urban flooding is about a quarter of the total flood-related annual average damage (Blanc et al. 2012). Other studies claim that 40 % of flood damage and associated economic losses are attributable to pluvial flooding (Douglas et al. 2010). Such damage levels highlight the need for devising reliable models that can predict how heavy rains lead to pluvial flooding and damage.

There is relatively wide scientific knowledge covering hazard and damage modeling of coastal and river flooding (e.g., Horritt and Bates 2001; Aronica et al. 2002; Apel et al. 2004, 2009; Merz et al. 2004; Booij 2005; Knebl et al. 2005; Hoes and Schuurmans 2006; Jonkman et al. 2008a, 2008b; Kok et al. 2009; Maaskant et al. 2009; Freni et al. 2010; Pistrika and Jonkman 2010).

Flooding in the urban environment, where overland flows depend on the complexity of the built infrastructure, is comparatively less studied. Recent availability of high resolution DEMs has allowed flood modeling research to explore urban topography to an increased level of detail (Dongquan et al. 2009; Kunapo et al. 2009; Maksimović et al. 2009; Jeong et al. 2010; Neal et al. 2011; Diaz-Nieto et al. 2011; Tsakiris and Bellos 2014; Tsakiris 2014; Bellos and Tsakiris 2014; Pistrika et al. 2014; Ravazzani et al. 2014).

An important bottleneck in flood risk analysis is the scarcity of data about damages (Pistrika et al. 2014). Spekkers et al. (2014) analyzed damage reported in insurance claims and different environmental and socio-economic characteristics, which explained close to a quarter of the variance of claim occurrence. One of the reasons for this low explanatory power is the low spatial resolution of rainfall grids (1 km2) and damage data (postal-district aggregations) used for the study.

Reports about flood incidents made by citizens, hereafter referred to as ‘reports’, provide a valuable source of information about flood occurrence and damage aspects. Reports can be used to analyze the impacts related to the typically subtle water-depths of pluvial floods and even account for the intangible caused damage (Arthur et al. 2009; Caradot et al. 2010; Veldhuis and Clemens 2010; Veldhuis 2011; Veldhuis et al. 2011).

In spite of the proven importance of topography in coastal and river flooding, and the availability of high resolution DEMs and flood reports, an analysis of the location of pluvial flooding incidents and the topographic conditions of the underlaying terrain has not been done yet. This work builds on results from previous exploratory analyses made at a municipal level, which displayed higher densities of reports counts in areas towards the outflow points of urban overland flow-paths (Gaitan et al. 2012). The present study statistically analyzes whether overland flow-paths constrain the spatial distribution of flood incidents in the case of a delta city, which is characterized by small ground level variations. This is a novel implementation that tests spatial autocorrelation on drainage distances between connected subwatersheds, including non-adjacent, along urban overland flow networks. The structure of this paper is as follows: Section 2 presents the area of study, data inputs and models used; Section 3 presents and discusses the results; and conclusions are finally provided in Section 4.

2 Data and Methods

The general approach used in this study is to aggregate reports into urban subwatersheds and then compute report counts and respective catchment areas. Those two variables are compared to determine if there are trends in the location and occurrence of reports over the underlying topographic conditions. The count of reports is used as a proxy of pluvial flooding damage. Locations towards the downstream end of intra-urban watersheds, which have bigger catchment areas, are likely to be exposed to higher overland flows during heavy rains, and therefore they are expected to account for higher reports occurrence.

2.1 Area of Study

In this study, data for a set of urban catchments in Rotterdam are analyzed. Rotterdam is located along the final 40 km of the course of the New Meuse river in the Rhine-Meuse Delta (Fig. 1a). It is one of the biggest cities in The Netherlands and has the largest European port. It is inhabited by close to 600 thousand people. Being a polder, its terrain elevations range from -6 to up to 10 meters above sea level. The city is a low lying environment, heavily urbanized, densely populated, vulnerable to pluvial flooding. Citizen’s reports about rain-related incidents, as well as a very detailed digital elevation model (DEM) are available for research. Rotterdam’s polder structure creates land areas with isolated surface waters, which enable straightforward overland-flows analysis. Ground level differences are small, with an average slope of 1.8 % and standard deviation of 2.8 %. In such flat terrains, flow-paths and watersheds can only be modeled from highly detailed DEMs. These characteristics make Rotterdam an interesting case for testing possible links between the location of flood reports and underlying terrain features in a Delta city. This study focuses on two different spatial scales: the District of Kralingen-Crooswijk, and the Neighborhood of Kralingen-West, covering aprox. 13 and 1 km2 respectively. The first one will be referred to in this paper as the ‘district level’, whereas the second as the ‘neighborhood level’. Kralingen-Crooswijk is a district in Rotterdam comprising densely urbanized, industrial and park areas. Overland flows in this district are isolated from the adjacent areas. Only Rubroek, one of the district neighborhoods, shares overland flow with the Centrum District. This neighborhood was excluded from analysis. The neighborhood of Kralingen-West mostly consists of residential and commercial areas.

Fig. 1
figure 1

a View of eastern Rotterdam; municipal borders and areas of study are enclosed with different line colors, b Visualization of flood reports and relative population density in neighborhoods of the Kralingen-Crooswijk district (Rubroek is excluded)

2.2 Available Datasources

A database of transcripts of telephonic reports about pluvial flooding made by Rotterdam’s inhabitants was made available for this study. It comprises 38,657 reports made from 2004 to 2011, and includes fields describing the neighborhood, street name, house number, short description of flooding incident, and reporting and problem solving dates. Of these, 36 registers did not have addresses, 12,663 did not have house number and could not be used for analysis, resulting in a final dataset of 25,958. A Python script was programmed to access and query the on-line Dutch public geo-information services (Publieke Dienstverlening Op de Kaart Loket 2013) to geocode the reports having street name and house number. 21,577 reports were successfully geocoded. The remaining unrecognized 4,417 reports could not be used in the analysis either; they included 2,922 registers with zeros as house number and 1,495 registries with addresses that were not available in the public register. Additionally, a digital elevation model (DEM) was used. This DEM was produced by means of Light Detection and Ranging (LiDAR) of ground levels from an aerial platform. The DEM includes heights of urban objects such as streets, sideways, buildings, cars and trees. Some blank areas in the DEM, represented by no-data cells, are associated with signal noise due to response of wet surfaces, reflective materials, and shadow effects at the base of tall objects with the LiDAR imaging. The DEM is characterized by a spatial resolution of 0.5 m × 0.5 m, a vertical precision of 5 cm, a systematic error of 5 cm, a random error of 5 cm, and a minimum precision under two standard deviations of 15 cm (Zon 2011). A land-use map was also available for Rotterdam. The map included polygons for each of the land-use classes.

2.3 Extraction of Hydrological Characteristics from the DEM

Some definitions are required for the rest of this paper. The term ‘overland flow-paths’ refers to the routes followed by rainfall running off over the watershed surface due to underlying slope aspects. A ‘subwatershed’ refers to the hydrological subunits composing a watershed, which are discretized by drainage boundaries, and that drain into specific outflow points along the overland flow-paths of that watershed. In this work, those outflow points are set at a minimum drainage area threshold, which implies that sizes of enclosing areas of subwatersheds are generally similar. The area enclosed by the delineation of a subwatershed can be different from its drainage area. The former is simply the area enclosed within the subwatershed boundaries, while the latter is the total overland area draining into its outflow point including the drainage areas of upstream subwatersheds. Differences between the enclosing and drainage areas in a network of sub watersheds are further explained in Section 2.3.1. The delineation of flow-paths and watersheds follows the approach proposed by Jenson and Domingue (1988) and Tarboton et al. (1991). Such delineation results in a tree-shaped network of subwatersheds that allows differentiating places in a city in terms of underlying overland drainage areas, which is suitable for analyzing the vulnerability of a given subwatershed to flooding as a result of depression-filling (Veldhuis et al. 2011). Pistrika et al. (2014) and Bellos and Tsakiris (2014) used DEMs, which include heights of building and other urban objects, for flood risk assessments in built-up areas to describe their topographic complexity. The following assumptions were made for the delineation of overland flow routes:

  • Inputs and outputs from/to the underground sewer network are blocked or saturated. This assumption was also made by Diaz-Nieto et al. (2011). This implies that reports are assumed to be made during sewer surcharge or sewer blockage conditions (Veldhuis et al. 2011).

  • Rainwater fallen on the buildings, tree canopies, and cars drains to the streets. The delineation of urban overland-flow routes is done on the basis of an elevation model, which includes urban features such as buildings, cars and trees. Changes of these features over time are not considered in this study. The used DEM represents the situation sensed by a series of LiDAR missions during 2008.

  • Rainwater in surface water channels does not overflow onto the streets. Water in canals is supposed to be managed by a system independent from the sewers, which is normally the case in polder systems. Canals are considered as outputs of the overland-flow paths.

  • The surface waters in the studied areas are isolated hydrological units, without interaction with adjacent hydrological units.

The DEM was prepared by first clipping the study areas and removing areas related to canals, lakes and rivers, using administrative and land used maps. Since the original DEM is a representation of the terrain under dry conditions (Zon 2011), a direct processing of a run-off direction model would lead to a model composed of isolated urban ponds. With continuing rainfall, local ponds fill-up until the water exits by the lowest point of water divides, flowing into a nearby urban pond or into a body of water (Maksimović et al. 2009). For that reason, the DEM was treated with a filling process. This process raises the water levels of urban subwatersheds that initially do not have an outflow point, until they are connected to an urban water body or to another subwatershed. The run-off direction model was then processed from the prepared DEM to develop a flow accumulation model using the D8 algorithm (e.g., Tarboton et al. 1991; Olivera and Maidment 1999). A threshold for the minimum flow accumulation value was established at a catchment area of 2,000 m2. This is an area comparable to a 100 m long and 20 m wide street. This threshold allowed us to delineate subwatersheds. The ending point of each overland-flow route was considered as the exit point of the corresponding subwatershed.

2.3.1 Definition of Non-Adjacent Connectivity

An example of a tree graph representation of the connections between subwatersheds is shown in Fig. 2a. In this graph, each of the subwatersheds has one unit of enclosing area. Numbers in brackets indicate drainage areas at the exit of each subwatershed. c, for instance, has a drainage area of 3 units of area, which equals the sum of the enclosing areas of itself, a, and b. For the case of g, while its enclosed area is 1 unit, its drainage area equals 7 units, which is the sum of the areas enclosed by all the subwatersheds in this watershed. On the other hand, for f, which has no upstream subwatersheds, enclosed and drainage areas have the same size. An adjacency matrix was built for the full network of subwatersheds on the basis of the adjacent connectivity along flow-paths. Figure 2b shows the connectivity matrix of the tree presented in Fig. 2a. This matrix represents whether the subwatershed of a given row is connected downstream to another one of a given column; a value of 1 means there is a downstream connection; a value of 0 means the opposite. See, for example, that subwatersheds a and b are adjacently connected to c; the latter, however, only shows a connection to e. A watershed matrix can be computed from an adjacency matrix using the expression: W = (IA)−1, where A is the adjacency matrix, and I is the identity matrix of A. (IA)−1 is the inverse matrix of (IA). W accounts for the full downstream connectivity of subwatersheds; for this reason, it is different from the adjacency matrix, which only indicates adjacent connections. The watershed matrix in Fig. 2c has been calculated using the adjacency matrix of Fig. 2b. In this example, while a is connected to c, e, and g; g has no downstream connections. Upstream tributaries can be found by looking into the columns; column e, for example, shows that this subwatershed receives overland flows from a, b, c, and d. The watershed matrix permits identifying each of the studied trees and their internal connections. A watershed matrix was computed for the area of study to determine all possible non-adjacent, downstream connections between subwatersheds. This matrix was then used to compute the differences in catchment areas between connected subwatersheds.

Fig. 2
figure 2

a Example of a tree of subwatersheds. The arcs represent downstream connections between adjacent subwatersheds. Literals indicate arbitrary names given to the subwatersheds. The root of this tree is g. b Adjacency matrix (A) of the network presented in (a). Subwatersheds have been labelled in rows and columns. c Watershed matrix (W) of the tree in (a)

2.4 Analysis of Spatial Distribution of Reports in Relation to Overland-Flow Paths

Vulnerability due to depression filling is expected to be higher at locations catching higher overland inflows. Therefore, subwatersheds located further downstream the overland-flow network are expected to receive higher report counts than the ones located upstream. This hypothesis assumes that reports are not randomly distributed throughout the urban space. This can be checked by testing whether report data display spatial structure under a spatial weighting based on the overland-flow network. Spatial distances and units of analysis to be studied in such approach must take care of underlying overland-flow networks rather than Euclidean distances.

Three different tests were performed to assess whether the spatial distribution of reports displays patterns. Those tests were run at the district and neighborhood spatial scales mentioned in Section 2.1. First, a simple Average-Nearest-Neighbor test was applied for checking clustering of reports. In this test, the distance between the location of each report and its nearest neighbor is measured. An average for all the nearest neighbor distances of each report is then computed and compared with a random distribution. Further details of this method can be found in Illian et al. (2008, p. 126–127) and Sinclair (1985).

Then the magnitude of the spatial autocorrelation in reports aggregated into subwatersheds was tested using a Global Moran’s I test. The report counts per subwatershed, and the distance between watersheds centroids on a Euclidean space, were used as input variables for this test. Further detail on Spatial Autocorrelation and the Global Moran’s I test can be found in O’ Sullivan and Unwin (2010, p. 195–206) and in Okabe and Sugihara (2012, p. 137–152). If the spatial distribution of reports is clustered given the arrangement of subwatersheds, the Global Moran’s I hypothesis of random distribution should be rejected.

As the overland flows between subwatersheds are determined by their connectivity, a second Globals Moran’s I test was performed using ‘drainage distances’ along overland flow-paths instead of Euclidean distances: the test was run only over pairs of subwatersheds found to be connected in the watershed matrix, and the distance used was the difference of their drainage areas. This type of distance quantifies the separation that two subwatersheds have in their relative position along the overland flow gradient. As an example, while the length of the two flow-paths connecting subwatersheds e and g, and subwatershed f and g, may be similar; the difference in catchment areas is 2 and 6 units respectively (see Fig. 2a). In other words, two connected subwatersheds can be geographically close to each other, and still be wide apart in terms of the situation of their catchment sizes. Using the difference in catchment areas as a distance metric for the spatial autocorrelation test allows us to check if the occurrence of flood reports is influenced by the underlying overland drainage condition. Comparing the results of the Global Moran’s I test on a Euclidean space with the ones constrained to the overland flow networks enable us to analyze the influence that depression filling may have in the occurrence and distribution of reports.

3 Results and Discussion

3.1 Computation of Non-Adjacent Connections at the District Level

After computing the watershed matrix, the number of independent trees identified was 1,717. There was an average of 3 subwatersheds per tree. The total number of actual connections between subwatersheds was 115,282. This large number can be explained by the increasing connections due to branching in a watershed. For example, in a single branched tree, made of 5 nodes, one of them being the unique leaf, the number of downstream connections equals 10: \({\sum }_{i=leaf}^{i=root} n_{i}\), where n is the amount of downstream nodes at each node. This value augments as a factor of the number of branches at the tip of such a tree. If it had two more leaves, the tree would have just two extra nodes, but the total number of connections would be three times higher: \({\sum }_{j=1}^{j=3}{\sum }_{i=leaf_{j}}^{i=root} n_{i}\), where j is each of the leaves, which equals 30. In reality every branching does not occur at the tips, but watershed networks are ideally more branched towards the tips. For the area of study, the presence of outliers with large numbers of subwatersheds can explain the large number of connections.

3.2 Testing of Spatial Patterns of Reports Distribution

Results obtained for the different performed clustering tests are presented in Table 1.

Table 1 Description of clustering tests and results

The average nearest neighborhood test, applied to non-aggregated reports, resulted in high z-scores of −50 and −20 at the district and neighborhood scales respectively. Associated p-values for both cases are extremely low. The magnitude of average distances between the nearest pairs of reports is higher at the district than the neighborhood scale. This result strongly suggests that single reports are not randomly distributed over the Euclidean space.

Results from the Global Moran’s I test showed that the null hypothesis of a random pattern in the spatial distribution of subwatersheds-aggregated reports is rejected at the district level, but not at the neighborhood level under a confidence of 99 %. However, there is 80 % probability of spatial autocorrelation in the latter case.

Such patterns do not hold when the Global Moran’s I test is constrained to the flow-paths gradient space. The hypothesis of reports being randomly distributed along overland flow-path gradients cannot be rejected. p-values, at 0.95 and 0.81 for the district and neighborhood levels respectively, are far from being significant. These results clearly show that flood reports are clustered when observed in an open, Euclidean space, but this clustering is not related to the modeled overland flow gradient.

3.3 Discussion

Other factors that can explain the observed clustering are the distribution of urban mosaics composed by buildings, streets, and green areas; the spatial variation of population density; and the layout of water infrastructure, such as canals and sewers.

Differences in the urban mosaic composition can explain the clear rejection of the null hypothesis in the average nearest neighbor test at the District level. The extent of green areas is considerably different between neighborhoods; e.g., while the neighborhood of Kralingen Bos mainly consists of a park, Kralingen West hosts dense residential and commercial infrastructure (see Fig. 1). Highly impervious, dense residential are possibly more prone to local pluvial flooding than green areas, characterized by a higher permeability. Land uses of low imperviousness are not randomly distributed over the district; their location has been determined by urban planning and development processes, resulting in a permeability heterogeneity. This can explain the non-random pattern of reports locations at the District level.

Population density is another factor that can explain the outcome of the Nearest Average Neighbor test. Reports are made by citizens; therefore, more highly populated areas are likely to present higher report counts. In Fig. 2b the comparatively low amount of reports in neighborhoods with lower population density is evident. This Figure also shows that areas with less green areas tend to account for higher populations.

Despite of being less strong than in the latter level, the z-score of the Average Nearest Neighbor at the neighborhood level is still substantial. Reports keep a strong clustering pattern within the neighborhood level. This suggests that the factors driving higher incidence of flood reports at the district level may also present a high spatial heterogeneity at the neighborhood scale. If imperviousness and population density heterogeneity are responsible for a structured spatial distribution of reports at the district levels, then results suggests that this heterogeneity is likely to be found, and influencing the distribution of reports, in the neighborhood level.

Results of the second test are consistent with the latter. At the district scale, where the urban heterogeneity is greater, a clustering pattern is recognizable, despite the spatial aggregation into subwatersheds of approximately 2000 m2. When the spatial level is focused into the neighborhood level, the effect of such aggregation is observed in a weaker, yet still considerable, p-value of 0.2. This suggests that an aggregation into 2000 m2 subwatersheds regions tends to overlook the spatial structure clearly recognizable in the average neighborhood distance test. On the other hand, the weaker p-value can be also due to less marked variation of the factors influencing report occurrence at the neighborhood level. While subwatersheds are used to aggregate reports in this second test, the discussion about the influence of the overland flow gradient can be better made on the basis of this third test.

The third test demonstrates the strong lack of evidence to support the idea that incidence of reports is linked to overland flow-paths; reports occurrence has no preference for downstream locations along overland flow-paths. Such random spatial distribution is further explored in Fig. 3, which presents cumulative counts of subwatersheds, enclosing areas, and report counts for the district level. The increasing rate of reports closely follows the cumulative area, suggesting that reports occur evenly along the overland flow gradient. Reports density along such gradient (see bars in Fig. 3), does not show an increasing trend. Given the high number of reports per year, many of them are likely to be associated with relatively small rain events that do not trigger a depression-filling process. This is confirmed by results of Veldhuis et al. (2011), who found that local blockage of sewer inlets was the main reported cause of flood incidents, occurring even during small rainfall events.

Fig. 3
figure 3

Cumulative sum of reports and area, and report density in binned drainage areas. Bins have the same number of subwatersheds

This result can be explained by the low sloping values of the city, which probably limits the onset of significant overland flows. Besides, the existence of canals throughout the city, which are heavily regulated by pumps, can mitigate the outbreak of puddles due to sewer blockage, malfunction, or overloading during heavy rain events. Serving as outflow receivers of overland flow-paths, canals can explain the low average of subwatersheds per tree in the studied area (see Section 3.1).

Discerning the effects of imperviousness, population density, and the proximity of a canal on the location of flood incidents cannot be done on the basis of the evidence obtained by this study, but it certainly is an analysis that might be revisited by future research.

4 Conclusions and Outlook

In this paper, the spatial distribution of reported local flooding incidents was investigated in relation to overland flow-paths and associated subwatersheds. Spatial clustering tests were performed on areas reportedly susceptible to urban flooding to determine if their location was linked to the underlying topographical conditions, in a city characterized by very low slopes. Those tests were based on datasets of flood reports and a highly detailed DEM. The methodological implementation presented in this study can be used to test whether the spatial distribution of a variable is determined by the underlaying urban overland drainage conditions. In spite of the documented importance of topography in the analysis of flood occurrence and risks in environments from mild to accentuate slopes, this study showed that this factor does not determine the location of reported flood incidents in a polder environment such as Rotterdam. This conclusion follows from the results obtained from the Global Moran’s I test constrained to the flow-paths connection space. On the other hand, the average nearest neighbor test, and the Global Moran’s I applied to the subwatershed-aggregation on a Euclidean space, probed that reports are definitely clustered. This suggests that factors different than the overland flow-path gradients, varying even at the sub-neighborhood scale, may contribute to the incidence of flood reports. Future research will assess the potential of population density, imperviousness, and water infrastructure to explain the occurrence of urban pluvial flood incidents.