A probabilistic approach to estimating residential losses from different flood types

Residential assets, comprising buildings and household contents, are a major source of direct flood losses. Existing damage models are mostly deterministic and limited to particular countries or flood types. Here, we compile building-level losses from Germany, Italy and the Netherlands covering a wide range of fluvial and pluvial flood events. Utilizing a Bayesian network (BN) for continuous variables, we find that relative losses (i.e. loss relative to exposure) to building structure and its contents could be estimated with five variables: water depth, flow velocity, event return period, building usable floor space area and regional disposable income per capita. The model’s ability to predict flood losses is validated for the 11 flood events contained in the sample. Predictions for the German and Italian fluvial floods were better than for pluvial floods or the 1993 Meuse river flood. Further, a case study of a 2010 coastal flood in France is used to test the BN model’s performance for a type of flood not included in the survey dataset. Overall, the BN model achieved better results than any of 10 alternative damage models for reproducing average losses for the 2010 flood. An additional case study of a 2013 fluvial flood has also shown good performance of the model. The study shows that data from many flood events can be combined to derive most important factors driving flood losses across regions and time, and that resulting damage models could be applied in an open data framework.


Introduction
Floods affect many types of assets, but residential buildings and their contents are usually the most exposed to extreme events due to their sheer number. For example, after the extensive 2016 floods in the Loire and Seine river basins in France, damages to dwellings constituted 68% of the number of all claims and 52% of the total value of losses (Fédération Française de l'Assurance 2017). Similarly, the vast majority of buildings damaged by the 1993 Meuse river flood in the Netherlands were residential buildings, which contributed 38% to total flood losses (Wind et al. 1999). Numerous damage models have been used to predict losses to residential assets. Accurate estimation, especially at the scale of individual buildings, is difficult as it requires good quantification of all three components of flood risk, namely hazard, exposure and vulnerability (Kron 2005;Merz et al. 2010).
Most damage models rely only on water depth, as it is by far the most important determinant of flood losses (Merz et al. 2013;Schröter et al. 2014;Amadio et al. 2019). Additionally, it is usually available from flood hazard analyses. Other hazard variables that are sometimes included are, for instance, flow velocity, inundation duration or level of contamination Gerl et al. 2016). Different flood types are characterized by different intensities of those parameters. Fluvial (riverine) floods, generated by rainfall or snowmelt, are associated with rather large water depths and long inundation duration, but rather low flow velocities and contamination levels unless a dike is breached. Consequences of pluvial floods from short but intense rainfall are dependent on local conditions. In small, especially mountainous, catchments, they generate high velocities and significant amounts of debris, but are of short duration. When occurring in cities, due to the exceedance of drainage systems' capacity (known as urban floods), rather low water depths are generated. However, velocities could be very high, and inundation duration could be large as well if action is not undertaken to remove the water from low-lying areas and basements. Coastal floods have the potential of causing both extreme water depths and flow velocities due to the mass of water involved combined with waves and, frequently, tides. Contamination from saltwater is another factor specific for this flood type and can contribute significantly to damages. In areas affected by dike or dune breaches, the duration of inundation can be long (Apel et al. 2016;Chen et al. 2010;Kelman and Spence 2004;Webster et al. 2014;Zellou and Rahali 2019). Given all those differences between flood types, it is common to separate flood damage models by individual flood types.
Exposure is the value of assets endangered by floods. Many approaches to estimate the size and economic value of buildings and their contents exist (Figueiredo et al. 2016;Huizinga et al. 2017;Paprotny et al. 2018Paprotny et al. , 2020aRöthlisberger et al. 2018). Some damage models directly estimate the absolute value of losses, but others only provide the relative loss (loss relative to exposure), which requires estimating exposure separately. Constructing a damage model from empirical flood loss data also requires obtaining data on exposure. Additionally, variables related to exposure are also used directly in multivariate models, such as building footprint area, presence of basement and building/contents value (Gerl et al. 2016;Wagenaar et al. 2018;Amadio et al. 2019).
Factors influencing flood losses not related to hazard or exposure fall under vulnerability. Those are, for instance, the construction characteristics of buildings, their occupants and external conditions that influence the amount of losses at a given intensity of hazard and amount of exposure. For example, the resistance characteristics of the buildings and use of precautionary/emergency measures are considered particularly important (Thieken et al. 2005;Merz et al. 2010;Van Ootegem 2015;Vogel et al. 2018). Building characteristics include building type (single-family, semi-detached, apartment blocks, etc.), number of floors, quality, material, size and age. Flood precaution or mitigation is related both to the deployment of particular measures (e.g. adapted use of buildings, installation of barriers, use of water pumps, evacuation) and their efficiency that depends also on early warning lead time or occupants' flood experience and knowledge of flood hazard. Flood preparedness is further related to household characteristics like ownership status or income as well as past flood experience related to frequency of flood events (Bubeck et al. 2012(Bubeck et al. , 2018. Vulnerability of buildings can be analysed by modelling the physical processes of flood actions on buildings (Kelman and Spence 2004;Korswagen et al. 2019), but in practice, much simpler methods have to be used as available data about buildings are typically not detailed enough.
Currently, there are several dozen damage models available-28 were identified for Europe alone by Gerl et al. (2016). All models were created for particular types of floods (river, pluvial, coastal) based on data from particular countries or even particular flood events. This specialization creates a problem of damage model selection when carrying out a flood assessment for a different flood type or country, let alone for a continental or global-scale study. This is further exacerbated since some models provide absolute losses, reducing their transferability, while some of the remainder lack accompanying exposure estimation procedures. Furthermore, most models are deterministic, often in the form of univariate damage functions/curves (Merz et al. 2013;Gerl et al. 2016). Multivariate, probabilistic models are fairly recent (Schröter et al. 2014;Rözer et al. 2019;Wagenaar et al. 2018), but are growing in popularity as they quantify the uncertainty of flood loss predictions. They also enable computing loss-frequency curves for whole portfolios, regions or countries (Schwierz et al. 2010), i.e. the probability that a loss of given magnitude would occur in a broader geographical area rather than a single location. Further, they are increasingly available for reuse, e.g. from Oasis Loss Modelling Framework (2020). The differences between damage models translate into very different predictions of flood losses (Apel et al. 2009;Merz et al. 2010;Bubeck et al. 2011;Jongman et al. 2012;Cammerer et al. 2013;Carisi et al. 2018). At the same time, uncertainty related to hazard intensity was found less important than uncertainty related to exposure or vulnerability (Apel et al. 2009;de Moel et al. 2011;Rojas et al. 2013;Metin et al. 2018). Some limited attempts of an integrated approach were made, such as combining data from multiple events within a country (Merz et al. 2013;Schröter et al. 2014), deploying ensembles of damage models (Figueiredo et al. 2018), creating synthetic pan-European or global models from national models (Huizinga 2007;Huizinga et al. 2017) or analysing the transferability of damage models between countries (Wagenaar et al. 2018).
Apart from progress in statistical techniques employed in damage models, increasing data availability enables new approaches to flood risk estimation. Assessments at various spatial scales, from local to continental, require advancement in several aspects in order to provide comparable, accurate and reproducible results including information on the uncertainty of the outcomes. A flood damage model that could be universally applied to different European countries and flood types should therefore: • Integrate the different intensities and characteristics of river, pluvial and coastal floods in one model that would be applicable to all types of floods. • Include consistent valuation of residential assets, including household contents, between countries and regions. • Combine data from multiple events and countries, so that the model would work in different socio-economic and geographical environments.
Preferably, such a damage model would also be probabilistic to quantify uncertainty and be implementable entirely using openly available datasets.
This paper aims at advancing the current methodologies of vulnerability estimation in flood risk assessments by tackling the above-mentioned goals. The approach presented here involves a building-level probabilistic damage model (Sect. 2.2) created through incorporation of flood loss data from river and pluvial floods in Germany, the Netherlands and Italy from a period of over 20 years (Sect. 2.1). It is validated (Sect. 2.3) not only for the 11 events in the sample (Sect. 3.1), but also for a dedicated case study of a coastal flood in France (Sect. 3.2) and further confirmed with an additional case study of a fluvial flood in Germany (Sect. 3.3). The limitations and uncertainties are discussed (Sect. 4.1) and needs for future work identified (Sect. 4.2).

Data collection and processing
The flood damage model is based on data collected from 11 flood events that have occurred in Germany, the Netherlands and Italy between 1993 and 2014. For each flood, a postdisaster household survey was carried out, supplemented by hazard and exposure information from various other sources. Since these floods and related survey datasets have been described before, we will refer to the appropriate publications for details, while providing only the most relevant information herein. A summary of the events is provided in Table 1 together with the information on the extent of impacts and post-disaster surveying efforts. The location of all collected data points (individual surveyed households) is presented in Fig. 1.

Flood events and post-disaster surveys
German floods represent the largest share of events in the dataset. Six fluvial events include floods caused by summer heavy rainfall in 2002 (Engel 2004;Ulbrich et al. 2003), 2005(Bayerisches Landesamt für Umwelt 2007(Polnisch-deutsch-tschechische Expertengruppe, 2010; caused by spring thaw combined with rainfall in 2006 (Bundesanstalt für Gewässerkunde 2006) and 2013 (Schröter et al. 2015); and by snowmelt in 2011 (Axer et al. 2012). The remaining pluvial flood events affected many locations in Germany, but the post-disaster surveys were carried out only in particular cities. The impact of the 2005 flood (Rözer et al. 2016) was surveyed in the towns of Hersbruck (Bavaria) and Lohmar (North Rhine-Westphalia), 2010 flood (Rözer et al. 2016) in Osnabrück (Lower Saxony) and 2014 flood in Münster and Greven, both in North Rhine-Westphalia (Spekkers et al. 2017).
Randomly selected households affected by all nine German floods were interviewed by a professional surveying company. The exact questionnaire varied between surveys, but primarily included flood intensity (e.g. water depth, duration and perceived velocity), the use of individual precautionary and emergency measures, building characteristics (e.g. type, age, number of flats, floor space), previous flood experience, the value of damages to building structure and household contents and socio-economic characteristics of the persons interviewed and their households (age, income, number of persons in the household, etc.). For detailed information on the survey methodology in general, we refer to Thieken Table 1 Summary of flood events and post-disaster surveys used in the study Information based on Carisi et al. (2018), Munich Re (2019), Paprotny et al. (2018), Rözer et al. (2016), Spekkers et al. (2017) and Wagenaar et al. (2017Wagenaar et al. ( , 2018. Total losses are at price levels of 2015 converted using country-specific gross domestic product deflator from Eurostat (2020 (2005,2017), for specific data collection and processing information for the fluvial flood events to Merz et al. (2013) and Schröter et al. (2014), and for the pluvial flood events to Rözer et al. (2016) and Spekkers et al. (2017).
The flood event in the Netherlands in December 1993 was caused by rainfall of long duration in the Meuse river basin over France and Belgium. This led to high river discharge in bordering Dutch province of Limburg and extensive flooding along a long stretch of the river Meuse (Wind et al. 1999). After the event, the national government compensated the flood damages, and therefore, experts were sent to collect information on every affected  (2019), rivers from CCM2 dataset (Vogt et al. 2007) household. The resulting dataset was amended by Wagenaar et al. (2017Wagenaar et al. ( , 2018 with cadastral data and a hydrodynamic simulation. This modified dataset is used in this study. The final event included in the study occurred in Italy in January 2014 and was caused by a structural dike failure along the Secchia river after a period of heavy rainfall (Orlandini et al. 2015). After the disaster, local authorities conducted surveys for the purpose of flood loss compensation. This dataset was then amended by Carisi et al. (2018) with hydrodynamic simulations and exposure estimates and as such is applied in this study.

Merging and processing data from flood events
The datasets from the described events were merged and then amended to increase consistency between the various sources. Also, data for variables not recorded in certain surveys were added from external sources, along with new variables. Variables considered in the study for inclusion in the flood damage model (Sect. 2.2.2) are listed in Table 2. It is worth noting that our study focuses on those variables that are available and consistent across all 3 case studies. Furthermore, only continuous variables (as opposed to discrete ones) are considered here as the statistical method used in the study requires specifically continuous variables. In practice, only continuous variables are available for all surveys except for building type. Examples of omitted variables include building age and presence of basement (not available for Italy); number of floors, use of precautionary and emergency measures, household characteristics or contamination of floodwater (only obtainable for Germany, though not for all areas); or various topographical indices such as distance from flood source, which is not applicable to pluvial floods. In this overview, we mostly refer the reader to the original studies for information on the derivation of flood survey data (marked "X" in Table 2) and focus on data added over the course of this study ("X/o" and "o"). Water depth, flow velocity and inundation duration come from two different sources. In the German surveys, the respondents were asked to estimate these quantities; water depth above the highest affected floor was transformed into water depth above ground level based on the number of steps leading to the ground floor and assumptions about basement height (Schröter et al. 2014). As for flow velocity, the respondents assessed it based on a qualitative scale, providing a value from 1 to 6, with half-points possible (Thieken et al. 2005). A value of 0.1 m/s was assigned to each full step of this qualitative scale. In case of inundation, the respondents provided an estimate how long their homes were under water, in hours or days. Data on water depths, flow velocity and inundation durations for the Dutch and Italian floods are the result of two-dimensional hydrodynamic simulations described in Wagenaar et al. (2017) and Carisi et al. (2018), respectively.
Return periods for German fluvial flood events are computations made from multiple gauging stations located along the affected river stretches by Elmer et al. (2010), and hence, return period varies locally within each event. Return periods of the 1993 Meuse and 2014 Secchia floods were also estimated from river gauge records by the authors of the respective case studies (Wagenaar et al. 2018;Carisi et al. 2018). A different approach had to be used for the pluvial flood events in Germany. The return period was computed firstly by obtaining hourly precipitation data in 1 km resolution from the RADOLAN dataset. This dataset is generated by the German weather service by combining precipitation radar and rain gauges (Deutscher Wetterdienst, 2018). A total of 13.5 years of data (mid-2005-end-2018) was gathered. At each of the 4 affected areas-Hersbruck 2005, Lohmar 2005, Osnabrück 2010, Münster 2014-the RADOLAN grid cell with the highest total precipitation during each event was selected as a basis of calculating intensity-duration-frequency (IDF)

Table 2
Variables considered in the study by groups of events and sources Key: "X"-data taken directly from the original studies; "X/o"-data taken directly from the original studies, but modified or amended for this study; and "o"-data prepared in this study Item curves. An R package IDF v1.1 (Ritschel et al. 2017), using methodology of Koutsoyiannis et al. (1998), was utilized in this computation. Once the IDF curves were obtained, they were applied to each RADOLAN grid cell that contained affected households from the surveys, generating return periods specific for each data point in the pluvial flood subsample. The total absolute damages (losses) to building structure and to household contents are estimates of the surveyed residents in the German dataset. In the Dutch dataset, the losses were assessed by damage experts conducting the surveys. The values for the Italian dataset were retrieved from compensation claims submitted to the government. The actual amount of compensation paid was also available for the 2014 Secchia flood, and however, it was usually much lower than the claims largely due to limited amount of money made available by the government (Carisi et al. 2018). We therefore relied on the value of claims despite possible overestimation of losses. The relative losses were calculated by dividing the absolute losses by the estimated value of the buildings and contents, description of which follows below. It should be noted that both the damage data and exposure estimates discussed below explicitly exclude private vehicles.
Exposure variables are related to the size and value of residential buildings and their contents. This refers to, where possible, the entire affected building and not only to the household surveyed. Usable floor space area of dwellings was recorded in the surveys, except for the 1993 flood, which was added from the Dutch cadastre (Wagenaar et al. 2017). The gross (replacement) value of building and household contents is a product of floor space area of the whole building and mean value of the building or contents per m 2 . The datasets differ in methods used to derive the mean value per m 2 . For Germany, we use the estimates included in the source database of the surveys-HOWAS21 (Kellermann et al. 2020), that were computed according to a methodology described by Thieken et al. 2005. The methodology regarding building value per m 2 is based on insurance industry guidelines (Dietz 1999) and distinguishes various characteristics of the buildings (number of storeys, basement size, roof type) recorded in the original survey data. The valuation of household contents is based on their mean insurance values per household and differentiated spatially using data on postal code-level purchasing power (Thieken et al. 2005).
For the Netherlands, building value was computed with a uniform value per m 2 due to lack of more detailed valuation data accessible for this country. They are used here as provided by Wagenaar et al. (2017). However, in the original Dutch dataset the value of contents was assumed the same in each household irrespective of their size and hence had to be replaced with a better estimate. Consequently, the value of contents was calculated by multiplying the floor space area by standardized contents value per m 2 based on the methodology described in Paprotny et al. (2020a). The original study covered only years 2000-2017, and hence, 1993 values were calculated using data listed in Supplementary Table S1. In this method, a timeseries of final household consumption expenditure on certain consumer durables in a country is transformed into the stock of consumer durables using the perpetual inventory method, which is a standard way to compute stocks of assets in economics and accounting. The estimated stock for the whole Netherlands was then divided by the estimated total floor space area of all dwellings in the country to derive a standardized contents value per m 2 .
The original study for the Italian flood used market (depreciated, or net) value of buildings and provided no information on exposure in terms of household contents. To avoid inconsistency with other flood events, we recomputed exposure by multiplying the floor space area with estimates of building value and contents per m 2 from Paprotny et al. (2020a). That study used national accounts and building construction data to generate timeseries of gross replacement costs of existing dwellings and consumer durables in 30 European countries (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017). In all studied areas, exposure estimates refer for the year of each event. They were derived (from previous studies or calculated here) in nominal prices for the purpose of obtaining relative losses. However, for inclusion of the total building and contents value as explanatory variables, they were expressed in real 2015 prices using variable-and country-specific deflators. The deflators for Germany and Italy were taken from Paprotny et al. (2020a), while in case of the Netherlands, they were extended back to 1993 as a result of exposure estimation carried out in this study.
Two economic variables at the regional level defined by the Nomenclature of Territorial Units for Statistics (NUTS) were collected to better express local exposure and vulnerability. Gross domestic product (GDP) and net disposable income of households per capita were obtained at NUTS level 3 or 2, depending on availability per variable. The data were collected for the year of each event, which was accessible for all floods in Germany and Italy from Eurostat (2020). In case of the Netherlands, the relevant regional data were only available until 1995 from Statistics Netherlands (2019), and therefore, the 1995 values were extrapolated back to 1993 using national growth rate of GDP and household income per capita. Both variables in all case studies were transformed to real 2015 prices using, respectively, the GDP deflator and the deflator for final consumption expenditure of households, which is a major subcomponent of the household disposable income account.
Households were geolocated in case of the German and Italian surveys on the basis of their street addresses, while in the original Dutch dataset, the households were identified only by six-digit postcode. Wagenaar et al. (2017) located them and extracted building characteristics from the cadastre on the basis of comparing modelled and surveyed water depths, taking the building among all in a postcode area that had the smallest difference in water depth between the datasets. This procedure could cause errors in calculating exposure and, consequently, relative losses. However, Wagenaar et al. (2017) consider potential errors to be limited as the very detailed Dutch postcodes typically refer only to a few buildings, usually of similar characteristics. Also, while the usable floor space area and other variables should refer to the whole building, it was found that at the German and Dutch datasets do not always consistently record damages and exposure for buildings with multiple households. In a minority of cases, they refer to one of the households at least for some variables. In the Italian dataset, the records are for individual households only, but they could be merged for multi-family houses based on their street addresses. This transformation was done for better consistency with the other two datasets, and therefore, the original 1330 data points were reduced to 782 through merging.

Bayesian networks
The flood damage model utilizes a class of graphical, probabilistic models known as Bayesian networks (BNs). In recent years, they have been increasingly used for flood risk modelling applications (Paprotny and Morales-Nápoles 2017;Beuzen et al. 2018;Couasnon et al. 2018;Jäger et al. 2018;Wu et al. 2019). Still, BN-based flood damage models have been few created from flood loss data for Germany (Schröter et al. 2014;Vogel et al. 2018;Paprotny et al. 2020b) and the Netherlands (Wagenaar et al. 2017(Wagenaar et al. , 2018. They used BNs for discrete variables together with algorithms for automated set-up of the models. In contrast, we apply here a nonparametric BN for continuous variables to create an expert knowledge-driven model. This particular variant of BNs is known as nonparametric due to the use of empirical marginal distributions and hence does not require assuming any continuous marginal distribution or discretizing the data as in discrete BNs. This method was originally introduced by Kurowicka and Cooke (2006). Compared to other possible methods, nonparametric BNs have several advantages, as they: • are probabilistic rather than deterministic, thus providing uncertainty bounds of the predictions (in contrast to multivariate regressions); • utilize continuous variables without assuming any marginal distribution, the need for discretizing or normalizing the data, which can significantly alter the results; • can be quantified with data that have partially missing values (not possible in discrete BNs or, e.g. random forests), and also with relatively small datasets; • can be applied, after quantification, in situations where whole variables for conditionalizing the BN are not available; • are graphical and, in contrast to machine learning methods, could be easily presented in their entirety; • the model's quality depends solely on the data and its structure, as there are no tuning parameters, which are numerous in machine learning methods.
A Bayesian network is "a directed acyclic graph, together with an associated set of conditional probability distributions" (Hanea et al. 2006). It consists of two elements: nodes, which are random variables represented by marginal distributions, and arcs, which indicate the dependency structure of the model. The node on the upper end of an arc is known as the "parent", and the node on the end is the "child". The joint probability density f x 1 , x 2 , … , x 2 is defined as follows: where pa(i) is the set of parent nodes ofX i . A BN is applied to give predictions for a particular case through updating the probability distribution of child nodes given new evidence at parent nodes. To quantify a defined structure of nodes and arcs in a BN, 2 elements are required, namely the marginal distributions and a representation of the dependency at each arc. Here, we use empirical (nonparametric) margins and normal (Gaussian) copulas as a dependency model. There are many copula types (see, for example, Joe 2014), and hence, we validate this assumption by analysing the fit of several copula types that were analysed with a "Blanket Test" by Genest et al. (2009). The statistic M for a sample of length n is computed as follows: is the empirical copula and Ĉn (u) is a parametric copula with parameter ̂n estimated from the sample. This goodness-of-fit test shows that the empirical copulas of 7 out of 14 (unconditional) variable pairs chosen for the model are best modelled by a Gaussian copula (Supplementary Figures S1-S2). Further, Morales-Nápoles et al. (2014) and Hanea et al. (2015) postulated that the joint distribution of a given nonparametric BN structure is uniquely determined. Hence, they proposed a "d-calibration" test to the validity of a normal copula for a particular BN structure, which is presented in Supplementary Figure S3. The determinant of the empirical rank correlation matrix of the selected variables falls within the 90% confidence interval of the determinant of an empirical normal distribution. This means that a normal copula is a valid assumption for the joint distribution of the variables. On the other hand, determinant of the rank correlation matrix of the final BN model is outside the 90% confidence interval of the determinant of the random normal distribution sampled for the same correlation matrix. This indicates that the joint normal copula is not valid for the particular configuration of the BN, though this d-calibration test is rather severe. Knowing this possible limitation, we nonetheless use the Gaussian copula, which has only one parameter, quantified using Spearman's rank correlation coefficient. The margins of a copula are uniform [0,1] distributions, created by transforming data into ranks. For detailed information on conditionalizing a nonparametric continuous BN with a Gaussian copula, we refer to Hanea et al. (2015).

A Bayesian network-based flood damage model
The flood damage model for flood loss estimation was created from the variables listed in Table 2. The variables of interest are relative building and contents losses, with a total of 9 variables to potentially explain their distributions, 4 related to hazard and 5 to exposure. An unconditional rank correlation matrix was computed (Supplementary Table S2) to identify the pair of variables with strongest correlation to begin configuring the model. Unsurprisingly, it was the correlation between water depth and relative building loss. Further arcs between variables were chosen based on the conditional rank correlations and theoretical explanations of the various dependencies, which are given below. Different configurations of the model were tested for various compositions of the sample. The final Bayesian network correlation matrix is shown in Supplementary Figure S4. Given many gaps in the German data and the size of the Dutch dataset, the analysis was mainly done with a reduced sample, in which the 2 datasets have similar number of records. For that purpose, all available German and Italian data were combined with a random sample of 40% of the Dutch data. This achieved a more balanced and representative sample with 7692 records (5091 with complete information for all variables). It should be noted that as the actual number of records used at each step is the intersection of availability of given two variables, it varies depending on the pair of variables considered. Influence of the sample choice is discussed in the results (Sect. 3.1). The procedure resulted in a model that adopts five explanatory variables (Fig. 2) for describing relative losses to buildings (brloss) and content (crloss). The various dependencies represented in the model are explained as follows: • Water depth (wd) is correlated with relative losses, as higher water levels affect a greater proportion of the structure of a building and can reach a higher share of contents inside, which is located at different floors (including basement) and heights above floor. Water depth was found to be the most important factor explaining flood losses in many multivariate analyses (Merz et al. 2013;Schröter et al. 2014, Wagenaar et al. 2017Rözer et al. 2019;Amadio et al. 2019). Water depth itself is influenced by the return period, as rarer events involve higher precipitation or discharges, therefore having more potential to cause high water depths. • Velocity (v) further adds to the losses as the hydrodynamic action of the water adds to pressure on objects. Also, potentially damaging debris is more likely to be carried by faster-moving water (Kelman and Spence 2004). Though velocity was found not necessarily relevant for loss modelling for the German fluvial events Vogel et al. 2018), it was significant for the Italian event (Amadio et al. 2019). Velocity is correlated with water depth, as, for example, fluvial events typically have both higher water depths and higher velocities than pluvial floods. • Return period of the flood event (rp) is the final hazard variable included. Apart from higher flood intensities, it may represent also some vulnerability component: a flood with low probability of occurrence will affect also some areas rarely affected by flooding, hence with lower level of preparedness or flood experience. Past occurrences of flooding are a strong predictor of the use of private precautionary measures (Bubeck et al. 2012). Higher vulnerability for areas affected by floods with higher return periods was noted, for example, in Elmer et al. (2010), Merz et al. (2013) and Wagenaar et al. (2018). • Floor space area (fsb) is the only explanatory variable negatively correlated with relative losses. Building with large floor space is more likely to have multiple storeys, and therefore, a smaller share of the assets is exposed to floodwater, thus reducing losses relative to exposure. Lower vulnerability of larger buildings was indicated, for example, by Kok et al. (2005) represents the general wealth of the population in the affected areas. Positive correlation indicating bigger losses for richer regions could be explained by the higher value of buildings and contents compared with national average, which was mostly used Only complete records from a balanced sample of German, Italian and 40% of Dutch data were used to compute the histograms and correlations here. Graph generated using Uninet software (Hanea et al. 2015) to determine exposure for the purpose of calculating relative losses (Paprotny et al. 2020a). In other words, it could indicate underestimation of exposure for wealthier regions, but also higher vulnerability of the type of buildings and contents typical for such regions. Additionally, regional income influences floor space area, as richer regions are generally more urbanized, as shown by economic data by urban-rural typology from Eurostat (2020). Therefore, they are more likely to contain multi-family buildings with large floor space. • Relative building loss (brloss) after including all previous factors is still highly correlated with relative contents loss (crloss). Buildings are directly exposed to floods, while damages to contents require water entering the building. Consequently, high intensity of the flood and large damages to the building (including to service equipment located therein) will result in losses to contents as well (Carisi et al. 2018).

Validation case study: 2010 coastal flood in France
The post-disaster surveys collected for this study did not include any instance of coastal inundation, which has different hazard characteristics. Therefore, we collected additional data to recreate residential losses during the 2010 coastal flood in France, which was triggered by the extra-tropical storm Xynthia. Strong winds caused widespread damage in France and other countries. Sixty-five deaths were recorded, including 47 in France, of which 41 died in the coastal flood in Vendée and Charente-Maritime departments (Kolen et al. 2013;Vinet et al. 2012). The inundation on the 28 February 2010 resulted from a storm surge (up to 1.6 m) in phase with a high spring tide and waves. 195 km of flood defences was breached or damaged and inundation depths reached up to 2.5 m (Lumbroso and Vinet 2011;Bertin et al. 2012). In the residential sector, an estimated total of 19,000 insurance claims were filed to the amount of 450 million euro, which amounts to 23,700 euro per household (FFSA/GEMA 2011). This excludes losses to cars or households affected only by the windstorm. In the most affected areas of the Charente-Maritime and Vendée department (Fig. 3), there were approximately 8560 and 4970 claims, respectively, worth 252 and 155 million euro (FFSA/GEMA 2011). André et al. (2013) found that losses to contents equalled 40-50% of losses to building structure, and hence, by taking the middle of this estimated range, we can assume the ratio of building to contents loss to be 69:31 in the observed losses. Affected residential assets were identified firstly by downloading building polygons from OpenStreetMap (2019). Where the function of a building was not stated, the land use layer from the same source was used to derive the occupancy. Very small buildings (less than 20 m 2 footprint area) were excluded, so that cottages, garages and other constructions unlikely to be houses would not appear in the analysis. Buildings located within the observed flood extent (as shown in Bertin et al. 2014) were selected (9008 in total) and the floor space area of each house was obtained using the prediction model from Paprotny et al. (2020a). Finally, the size of each building was multiplied by the estimated value of residential assets in France in 2010 for the same source: 1561 euro per m 2 for buildings and 291 euro per m 2 for contents (see also Supplementary Table S5). Water depths from hydrodynamic simulations covered around two-thirds of identified buildings (5995). Given that buildings in the affected municipalities contained an average of 1.13 household in 2010 (Eurostat 2020), the analysis covered an estimated 6774 households, i.e. slightly less than half of the number of claims in two most affected departments and a third of the total number of claims related to flooding from the Xynthia storm.
Water depths were taken from Bertin et al. (2014), who reanalysed the 2010 event using the 2D hydrodynamic model SELFE, fully coupled with the spectral wave model WWMII (Roland et al. 2012). The implementation included an unstructured grid with a resolution ranging from 30 km to 5 m incorporating detailed topography and bathymetry from lidar scanning and echo sounding and forced with 0.10°-resolution meteorological data. The 1 3 results had a good match with observed extents. However, the model did not include dike breaches, and some of the most affected areas were not shown by the model as inundated. To reduce this inaccuracy, we combined the results of Bertin et al. (2014) with a study dedicated specifically to the flooding of La Faute-sur-Mer municipality, which was the most severely affected area (29 out of the 41 deaths from inundation were recorded there). A simulation by Huguet et al. (2018), which was a modification of the model set-up from Bertin et al. (2014), provided a much more precise reanalysis of the flood in La Faute-sur-Mer due to higher resolution and improved data on height of flood defences. Unfortunately, information on flow velocity was not available, and hence, this node of the Bayesian network was left unconditionalized, i.e. the prior distribution from our sample was assumed in all cases of flooded buildings in the 2010 event. The return period of the event was set to 270 years, as estimated by Bulteau et al. (2015). Household income per capita as of 2010 was obtained from Eurostat for affected NUTS2 regions Pays de la Loire (FR51) and Poitou-Charentes (FR53). The mean income amounted to 18,671 and 18,566 euro in 2015 prices, respectively.

Additional case study: 2013 fluvial flood in Saxony
To check whether the results obtained in the Xynthia case study are not incidental, we carried out an additional application of the BN model. In this case study, the affected area is somewhat more familiar to the model, as some of the survey data include the 2013 fluvial flood in Germany. Here, we aim to reproduce total residential losses recorded during this event in Saxony. Some 13,000 households were affected by the event in this federal state . The state government supported private households and non-profit institutions (associations, churches) with 277 million euro.
The flood extent and water depths were derived through intersection of recorded floodwater elevations from aerial scanning, carried out by the German Federal Institute of Hydrology during the event (Bundesanstalt für Gewässerkunde 2015), and a 10 m digital elevation model from the Federal Agency for Cartography and Geodesy (Bundesamt für Kartographie und Geodäsie 2015). However, this product is limited to the biggest rivers along which the flood occurred, namely Mulde and Elbe (Fig. 4).
Exposure during the 2013 event was estimated firstly by obtaining OSM data (Open-StreetMap 2019) for five counties of Saxony covered by the hazard data (Dresden, Kreisfreie Stadt; Meißen; Sächsische Schweiz-Osterzgebirge; Leipzig; Nordsachsen). As in the French case study, where the function of a building was not stated, the land use layer from the same source was used to derive the occupancy. Very small buildings (less than 20 m 2 footprint area) were excluded, so that cottages, garages and other constructions unlikely to be houses would not appear in the analysis. The floor space area of each house was obtained using the prediction model from Paprotny et al. (2020a, b, c). In total, 4831 residential buildings were identified as flooded and included in the analysis. In Saxony in 2013, the average number of dwellings per residential building was 2.79 (Statistisches Bundesamt 2020), and hence, the buildings represent approximately 13,478 households, close to the number actually affected. The estimated exposure in Germany in 2013 per m 2 was 2002 euro for structure and 386 for contents (Paprotny et al. 2020a, b, c). An alternative exposure computation, which we use for comparison of the results, was again taken from JRC (Huizinga et al. 2017). It indicates replacement cost per m 2 being 2296 euro for structures and 1148 for contents.
Finally, data for the damage model were collected. Information on velocity was not available as for France. The return period was spatially variable and amounted from 12 to 457 years. It was drawn from the return periods computed for the German survey data and assigned to OSM buildings according to proximity. Household income per capita as of 2013 was obtained from Eurostat (2020) for affected NUTS2 regions Dresden (DED2) and Leipzig (DED5). In both cases, the mean income was the same and amounted to 18,154 euro in 2015 prices.

Performance indices and comparative flood models
Predictions of relative losses to buildings and household contents are compared with observations using several error metrics (Moriasi et al. 2007;Wagenaar et al. 2018): Pearson's coefficient of determination (R 2 ), mean absolute error (MAE), mean bias error (MBE), symmetric mean absolute percentage error (SMAPE) and root-mean-squared error (RMSE). SMAPE normalizes MAE by considering the absolute values of predictions and observations, with value close to 0 indicating small error compared to the variability of the phenomena in question. Equations for the listed measures are shown in Supplementary  Table S3. For validation purposes, we use the predictions as mean (expected) values of the uncertainty distribution of the variables of interest per each data point (building). Uncertainty ranges are provided for the prediction of total losses per event, including the validation case study described in Sect. 2.3.1. Results of the BN model for the validation case study were compared with 10 alternative models (Supplementary Table S4). Six of the collected models are simple univariate damage curves (Hydrotec 2001;ICPR 2001;Huizinga 2007;Klijn et al. 2007;Luino et al. 2009), including at least one per country covered by flood loss surveys. (No model was identified for France.) One model created specifically for coastal floods (Reese et al. 2003), MERK, provides curves for 4 different construction types, and therefore, we use an average of those linear models. Two further models are in the form of look-up tables (MCM and FLEMOps +). MCM provides only absolute damages for present-day UK (Penning-Roswell et al. 2013), and therefore, the MCM damage functions had to be recalculated to conform with our exposure estimates. The damages were transformed into losses in France (2010) and Saxony (2013) using a ratio between the estimate exposure per m 2 of floor space in the respective cases studies (Sects. 2.3.1 and 2.3.2) and exposure per m 2 in the UK for 2017 taken from Paprotny et al. (2020a). For the purpose of this analysis, we did not consider building age and social grade of occupants when applying the MCM model, as this information was not available. FLEMOps + (Büchele et al. 2006) was implemented assuming medium quality of buildings in all cases, while contamination and use of precautionary measures were not considered due to lack of data.
Another pair of models uses two different data-mining methods. RF-FLEMOps utilizes ensembles of regression trees, a method known as random forests (Merz et al. 2013). Due to the inflexibility of the method to missing data, we had to adapt the RF-FLEMOps model to operate with four variables only (water depth, return period, floor space and building/ contents value) instead of the original 13. This was done by using the original dataset and data-mining algorithm, but rerun with only four variables. BN-FLEMOps is a discrete Bayesian network-based model with seven variables predicting either building or contents loss (Wagenaar et al. 2018). As noted earlier, a BN-based model can work also with missing data, and hence, BN-FLEMOps was applied without modifications to all study areas.
Alternative exposure estimates were obtained from Huizinga et al. (2017). That study shows values of residential buildings per m 2 taken from two external construction cost surveys, of which one provides data for all four countries analysed in this paper. As exposure is provided in 2010 prices, they were adjusted to the year of each flood event using an appropriate residential building price index. To estimate contents value, Huizinga et al. (2017) suggested taking half of the value of residential buildings. The results, which will be referred to hereafter as "JRC exposure", are provided in Supplementary Table S6. In general, JRC exposure is rather closely aligned to estimates for buildings from Paprotny et al. (2020a), which will be referred to as "GFZ exposure", but much higher for household contents (Supplementary Table S5).

Flood damage model validation
In the analysis of the results from the BN-based model, we focus on the variant using a "balanced" sample consisting of all German, Italian and 40% random sample of Dutch data (see Sect. 2.2.2), which provided the best predictions overall. The basic validation results are shown in Table 3, while details for 8 different sample sources are collected in  Supplementary Table S7. The coefficient of determination (R 2 ) is mostly low, below 0.3, as is also evident from the scatterplots for all areas combined (Fig. 5). The value of this metric similarly to mean average error (MAE) and mean bias error (MBE) is to a large extent proportional to the variation in observed relative losses. Hence, the relatively low MAE for the German pluvial floods and the 1993 Meuse flood is largely due to typically small relative loss compared to most of the German fluvial floods (see Supplementary Table S8 for average values). Using the symmetric mean absolute percentage error (SMAPE) reveals that the worst performance of the model was recorded for the German pluvial floods of 2005 and 2010 as well as the fluvial flood of 2011. However, the size of the error is further determined by the number of data points for each individual flood. German river floods of 2002, 2010 and 2013 are much more heavily represented in the Table 3 Validation results (fivefold cross-validation) of the model for different flooded areas, using all nodes of the flood damage model and a "balanced" sample N number of observations used for validation, brl relative loss to building structure, crl relative loss to household contents dataset, and hence, they have better R 2 and SMAPE values, and lower bias, compared to other German events. In effect, the total loss for the surveyed households is more accurately represented, though the 2005 fluvial flood had the lowest error in modelling total contents loss (Table 3 and Fig. 6). Also, the German pluvial flood of 2014 has more data points available and as a consequence is more accurately modelled than the 2005 and 2010 events. However, the 1993 Dutch flood is largely overestimated despite a large quantity of available data. Total building losses for the 2014 Italian flood are accurately represented, but the performance for individual households or prediction of total contents loss is only similar to model average. In general, modelled losses to building structure have lower average errors compared to household contents, but the relative losses to buildings are also lower. The value of SMAPE is mostly similar for both relative building and contents loss. Yet, predictions of overall losses to contents are better for 7 out of 11 study areas and also for all areas combined. This difference is particularly noticeable for the German pluvial floods and smaller river events. Also, the 95% uncertainty ranges of the modelled estimates of building loss cover observed totals for only 5 out of 11 events, while for contents, it is 7 out of 11 (Fig. 6).
The choice of sample can influence the results significantly. Whereas randomly removing part of the Dutch data has marginal influence on the results (Supplementary  Table S7 and Supplementary Figure S6), the results show that individual case studies have limited transferability and need to be pooled together. Quantifying the model only with German river floods mostly results in higher loss estimates, while using German pluvial floods leads mostly to underestimation, though the predictions for pluvial floods themselves become much more accurate. Also, German pluvial flood data are much better predictor of losses during the 1993 Meuse flood than German river flood data. On the other hand, the data from German events tend to underestimate losses during the Italian event.
Different exposure estimates (from original surveys, GFZ exposure and JRC exposure, see Sect. 2.1.2) can also affect the results (Supplementary Table S9). Replacing exposure estimates discussed so far with alternative GFZ estimates improves slightly the predictions for the German pluvial events and has the opposite effect on predictions for fluvial floods. The effect on the Dutch and Italian floods is limited. Using JRC exposure estimates vastly increases error in predicting total contents loss, but has limited effect on building loss predictions.
Finally, in the context of the 2010 coastal flood case study, the influence of not using velocity information was analysed (Supplementary Table S9). The model's accuracy becomes lower, resulting in higher predictions for the Netherlands and lower for German river floods. The Italian event and the German pluvial floods are only marginally influenced when the velocity node of the BN model is not conditionalized.

Application of the model to the validation case study
The BN model underestimates losses recorded in the Charente-Maritime and Vendée departments during the 2010 coastal flood. The modelled losses of 163 million euro are significantly less than 408 million euro indicated in insurance claims (Table 4). However, this is mostly due to undercoverage of affected buildings. (Less than half were identified.) Average losses are closer to what was reported, with underestimation for Charente-Maritime and Vendée of 17% (uncertainty range 13-20%) and 30% (24-35%), respectively. Also, the average loss per household indicated by the model is close to the average for all households affected by flooding (within the uncertainty of the BN predictions). Different choices of sample for the BN model, the "balanced" sample performs best considering all areas affected by the flood, while for the two French departments, the German fluvial data provide the most accurate estimate, also in combination with the pluvial or Dutch data (Supplementary Table S10). The BN indicates the same degree of error in predicting both building and contents loss under the GFZ exposure (Table 5).
The performance of the BN and 10 alternative flood damage models varies substantially (Table 5). The BN has the best result for reproducing the average losses for the flood, though certain models work better for the most affected two regions, especially the 3 FLEMO models. Those models use the German fluvial flood data; therefore, their performance is similar to the BN model from this study run with the same sample. The only model specifically made for coastal floods, MERK, has the second-best result for the whole flooded area. However, this is largely because the significant underestimation of losses to buildings is compensated by even bigger overestimation of losses to household contents. MCM provided rather accurate predictions of contents loss, but overestimated building loss. Univariate damage curves which do not distinguish between buildings and contents provide mostly very inaccurate predictions of the total losses.
Using different exposure estimates has considerable impact on the results (Fig. 7). Using JRC estimates instead of GFZ's, the predicted losses are much higher, to the extent that all but one model overestimate losses from the coastal flood. The BN model from this study gives best predictions for the most affected regions under those exposure estimates. Yet, as with MERK, MCM and FLEMO models, it overshoots contents loss multiple times. This is due to the very high exposure values for contents in the JRC data. The impact of those estimates on all damage models gives greater confidence in the GFZ exposure data.

Application to the 2013 flood in Saxony
The second case study involves the 2013 fluvial flood in Saxony (Germany), and some data points from this event are included in the German survey. Nonetheless, using compensation data from the state government, we could analyse whether the BN and alternative models can recreate the average loss to households during the event using openly available data on exposure.
The results are presented in Table 6 and Fig. 8. The BN model overestimated the average losses by 12% (95% uncertainty range 7-18%). One model (ICPR) achieved the same result. All other models indicated at least 60% more losses than the observed average. As for the 2010 flood in France, all models give higher predictions under JRC exposure estimates than when using GFZ's valuations. Still, the BN model performs best under those circumstances, whereas other models at least double the observed losses. This confirms that the results of the main validation case study, namely that the BN model and GFZ exposure estimates achieve better results than the other published models (Fig. 8).

Uncertainties and limitations
The Bayesian network-based flood damage model includes uncertainty related to the methodology and input data. Methodologically, using continuous variables only may exclude important discrete variables used in other flood models, such as building type and quality, water contamination, use of precautionary or emergency measures and social characteristics of the household occupants. On the other hand, the availability of such data, both across the datasets and in case studies such as the 2010 coastal flood, is limited. The only discrete variable obtainable for all datasets and the case study is building type, which is related to the usable floor space area used in the model. The use of nonparametric marginal distributions has the benefit of avoiding making assumptions about the distribution shape or discretizing the data. However, the limitation of the method is the assumption of a Gaussian copula as the dependency model, which does not account for any possible tail dependencies. Improvements to nonparametric BNs would be needed to include more flexible dependency structures. The heterogeneous input datasets are a large source of uncertainty. The German survey datasets rely almost entirely on the respondents' recollections of the flood event, which could be inaccurate in describing the hazard component (especially flow velocity). In all cases, the data could be uncertain due to the time elapsed between the flood and surveytaking activities (1-2 years). The Dutch and Italian surveys were taken shortly after the flood, but also rely on the output of hydraulic models to extract flood hazard data. In the Dutch dataset, the limited accuracy of geolocation information further compounds the possibility of errors related to the flood hazard component. Merging of the datasets is problematic as, for example, German data include water depths lower than zero to represent basement flooding, in contrast to the other 2 data sources, where water levels are always above the terrain. Transformation of flow velocity from German data and computation of return periods for fluvial and pluvial events are further sources of potential errors. Usable floor space area represents the whole building rather than individual households, though in a minority of cases, it most likely refers to particular households of multifamily buildings only. Yet, inconsistencies and uncertainties are always present in flood loss datasets, so efforts to collect bigger and improved data should continue internationally also to increase the amount of available data points.
Modelled flood losses from the 2010 coastal flood were underestimated for 2 French departments that were most affected. This is possibly to a large degree due to limitations in the hazard and exposure data. Firstly, more than half of the households which submitted insurance claims for flood losses were not identified with the available data. Partially, it is caused by inaccuracies of the hazard data, where many affected locations were not shown as inundated. Additionally, the OpenStreetMap building dataset does not capture many inundated residencies as 1510 houses were demolished in the aftermath of the 2010 flood by the French government (Lumbroso and Vinet 2011). Those buildings represented the most severely affected households, hence pushing down the average modelled losses compared to observations. Also, the velocity node was not used as the information was not available for the case study. Finally, the potentially significant effect of saltwater intrusion was not included in the model, as such information was not available for the fluvial and pluvial event the damage model was based on. For the 2013 Saxony case study, the uncertainty of the results mostly stems from hazard data (lack of velocity data or coverage over the whole affected area) and the problem of uniformizing the reference units (buildings vs individual households) for both observed and modelled losses.
The results of the BN model were contrasted with those from other models only for the case study. Comparing all models for the 11 flood events in the study would not be a fair comparison, because only some models were trained with the datasets included here, and none of them with all of them together. Nonetheless, some insights could be drawn from it (Supplementary Table S11 and S12, Supplementary Figure S7). Models using a logarithmic-type damage curve (Damage Scanner, HWS-GIS, Boesio, JRC) vastly overestimate losses in almost all cases, and the exponential model ICPR grossly underestimates those. Some models, ICPR, MERK and MCM, predict total losses from the German fluvial events very well. However, in both cases large errors in estimating building losses were simply compensated by errors of opposite sign in estimating contents loss. All underestimate losses from pluvial events. The three multivariate FLEMO models (based on German fluvial events) mostly overpredict losses, especially to building structure. The inaccuracy is usually lower than in other models, but not compared to the BN from this study. All models are particularly inaccurate in recreating the 1993 Meuse flood.

Future outlook
The model could be further developed by expanding the input data with more flood events. Post-disaster surveys were carried out, e.g. in France (Poussin et al. 2015) and the UK (Defra/Environment Agency 2004). Microdata from insurers could also be used, as, for example, in case of the 2008 and 2010 coastal floods in France (André et al. 2013). The main obstacles in using these datasets are obtaining them together with geolocation, while at the same time protecting the sensitive nature of the data containing responses of household occupants. Also, each individual dataset contains different variables and/or their definitions, requiring more effort to homogenize the data.
The model will be tested further in the framework of the EIT Climate-KIC Demonstrator project "SaferPLACES". Currently, four urban case studies representing fluvial, pluvial and coastal floods are under analysis-Cologne (Germany), Pamplona (Spain), Milan and Rimini (Italy). The implementation of those flood risk analyses involves only openly available data or those generated in the project with open data. The model is being validated using aggregated insurance data from various flood events that have affected the four cities in the past. The results are accessible as an online web tool (https ://platf orm.safer place s.co), where the damage model presented in this study can be run under different flood mitigation scenarios.
The possibility of implementing precautionary measures in a BN-based flood damage model will be investigated. The main consideration here is the need for using continuous variables, which prevents the use of indicators such as presence of a given precaution measure. Our approach will be complimentary to damage modelling of commercial assets, based on dedicated German post-disaster surveys and a BN-based flood damage model (Paprotny et al. 2020b). Both damage models are publicly accessible as part of a toolbox for nonparametric Bayesian network (Paprotny et al. 2020c).

Conclusions
The aim of this paper was to combine post-disaster flood survey data into a flood damage model able to accurately predict losses to residential assets. It resulted in a nonparametric Bayesian network (BN) with seven variables: 2 variables of interest (relative loss to building structure, and separately to household contents) and 5 explanatory variables related to flood hazard, exposure and vulnerability. The most important variable was water depth, followed by floor space area, with the remaining having very similar importance (flow velocity, return period, regional income per capita). The model, by combining data from three countries (Germany, Italy, the Netherlands) and two flood types (fluvial, pluvial), includes the most diverse set of flood loss data used in residential damage modelling.
RMSE error of the damage model was on average 7.8 and 13.5 percentage points. Errors for the 11 events included in the dataset were largely proportional to the variability of data. Overall losses from events were overestimated for the 1993 Meuse river flood for both buildings and contents loss, and for pluvial events for losses to buildings. The observed loss from the German and Italian events was mostly within the uncertainty range of the model. The BN-based flood damage model was further applied to the 2010 coastal flood in France, as coastal events were not included in the sample used to build the model. It was also compared with alternative flood models for this case study. Some of the other models achieved better performance than our model, but the results are sensitive to exposure estimates. The BN model had the best match with average observed losses per household for the whole area affected in 2010 (2% difference), but underestimated losses for the 2 regions with most impacts (22% difference). The additional case study of flood in Saxony in 2013 indicated a 12% overestimation of average losses. The results, including many configurations of models presented in the supplement, highlight that the approach of combining multiple flood events has the potential to create a more transferable and universal model.
The BN model can be combined with exposure estimation routines from Paprotny et al. (2020a) to provide estimates of residential flood losses in Europe. It allows building-level flood damage estimation through open data sources and providing uncertainty information.
The modelling procedure is being tested on case studies from several European countries to validate the robustness and comparability of the method. The model presented here is part of larger activity to make flood risk analysis tools accessible through online web tools and publicly available code. the project URBAS (BMBF, 0330701C). Fluvial and pluvial data collections in Germany were additionally supported by a joint venture between the German Research Centre for Geosciences GFZ, the University of Potsdam and the Deutsche Rueckversicherung AG, Duesseldorf. The results of the cross-validation and Xynthia case study are available in figshare (https ://doi.org/10.6084/m9.figsh are.12045 345). Most of the German flood loss data are available via the German flood damage database HOWAS21 (https ://howas 21.gfz-potsd am.de/howas 21/). The 1993 Meuse flood data are available in the supporting information of Wagenaar et al. (2017). The BN damage model is openly available as MATLAB code as part of BANSHEE toolbox (https ://githu b.com/dompa p/BANSH EE).
Funding Open Access funding enabled and organized by Projekt DEAL..

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.