Pan-European hydrodynamic models and their ability to identify compound floods

The interaction between storm surges and inland run-off has been gaining increasing attention recently, as they have the potential to result in compound floods. In Europe, several flood events of this type have been recorded in the past century in Belgium, France, Ireland, Italy and UK. First projections of compound flood hazard under climate change have been made, but no study has so far analysed whether existing, independent climate and hydrodynamic models are able to reproduce the co-occurrence of storm surges, precipitation, river discharges or waves. Here, we investigate the dependence between the different drivers in different observational and modelled data set, utilizing gauge records and high-resolution outputs of climate reanalyses and hindcasts, hydrodynamic models of European coasts and rivers. The results show considerable regional differences in strength of the dependence in surge–precipitation and surge–discharge pairs. The models reproduce those dependencies, and the time lags between the flood drivers, rather well in north-western Europe, but less successfully in the southern part. Further, we identified several compound flood events in the reanalysis data. We were able to link most of those modelled events with historical reports of flood or storm losses. However, false positives and false negatives were also present in the reanalysis and several large compound floods were missed by the reanalysis. All in all, the study still shows that accurate representation of compound floods by independent models of each driver is possible, even if not yet achievable at every location.


Introduction
Compound floods are a specific type of floods, when two or more drivers of those coincide in space and time: storm surges, waves, tides, precipitation and high river discharges. The coincidence can be simultaneous or successive; the drivers can amplify each other, or even lead to impacts when neither driver is extreme by itself (Leonard et al. 2014;Zscheischler et al. 2018). Presently, growing consideration is given to possible co-occurrence of hazards previously considered independently (Leonard et al. 2014). This attention is drawn primarily by damages caused by both coincidence of surge and excessive rainfall during tropical cyclones in the USA, including the $150-billion deluge in Houston during hurricane Harvey in August 2017 (van Oldenborgh et al. 2017). Yet, climate of Europe differs substantially from American coasts, which are often affected by tropical storms. Even in the USA, along coasts outside the paths of hurricanes there is very little dependence between coastal water levels and heavy precipitation (Wahl et al. 2015), while correlation between river flows and surges is spatially diverse (Moftakhari et al. 2017). European coasts are affected by extra-tropical cyclones, with diverse mechanisms of fluvial and pluvial floods. At the same time, several large European cities located in river estuaries are prone to coastal floods, such as Antwerp, Hamburg, London and Rotterdam.
Historical information on past damaging floods in Europe reveal that compound events have occurred in many locations. According to HANZE database (Paprotny et al. 2018), out of 1564 floods that occurred in 37 European countries between 1870 and 2016, 23 (1.5 %) were compound floods, recorded in six countries. The highest number of compound events, nine, were observed along the northernmost coast of the Adriatic Sea-Italian regions of Veneto and Friuli-Venezia Giulia (1927, 1951, 1952, 1953, 1957, 1966, 1986, 2008. In those situations, the events' river and coastal components merely occurred at the same time, generally without directly exacerbating total water levels. Altogether, Adriatic Sea surges and coinciding high flows in the Po river resulted in approximately 25 fatalities altogether and several thousand people affected. Another "hotspot" for compound events is the Mediterranean coast of France. Five damaging compound floods could be identified (1872,1997,2005,2006,2013). In 1872, the surge coincided with 8 days of rain, resulting in 18 fatalities. The December 1997 event affected the vicinity of the Rhone river estuary, which was swelled by 669 mm of rainfall in 4 days and a storm surge. The other three were flash floods caused by more than 200 mm of rain in 24 h at the time of high sea levels induced by strong winds. Both floods caused one fatality each and many losses in several locations in the southern coast (and Corsica in 2013). The Western coast of France witnessed compound floods as well, for example along the Charente river in 1962 (1600 persons affected) and several rivers in the Brittany region in 2000 (600 persons affected). In both cases, a storm surge appeared during a particularly wet period, causing river flows to be elevated for a long period of time.
Remaining compound events across Europe are similar to those occurring in the western coasts of France. Surges and long periods of rainfall elevating river water levels caused compound events in Ireland in 2004 (200 persons affected) and 2009 (6800 affected), in England along The Humber in 1954 (4000) and the Bristol Channel in 1999 (1200), as well as on the river Scheldt in Belgium in 1928 (10,000). On the contrary, the causes of the 1928 Thames flood, which resulted in 14 deaths in London and affected 4000 people, were unusual as high river discharge was a consequence of snowmelt, and the relatively moderate storm surge was exacerbated by a high tide. In the Baltic Sea, the only known instance of compound events was the storm surge along the Polish coast in 2009. There, several consecutive storm surges combined with strong northerly winds increased water levels and caused inundation along several rivers (Kowalewska-Kalkowska 2018). Flooding along the Odra river went as far as the city of Szczecin, 70 km upstream.
Many studies have analysed compound flood hazard in Europe using various observational data sets and spatial scales, including the co-occurrence of storm surges with extreme precipitation (Bengtsson 2016), river discharges (Bevacqua et al. 2017;Ward et al. 2018;Ganguli and Merz 2019a, b;Hendry et al. 2019), precipitation and discharges Jones 2002, 2004) and waves (Wahl et al. 2012;Gouldby et al. 2014). Some studies also used hydrodynamic models to derive drivers over larger areas (Petroliagkis 2018;Couasnon et al. 2019;Khanal et al. 2019). Given that climate change is expected to increase the level of hazard in many parts of the continent through higher sea levels (Vousdoukas et al. 2016a), river discharges (Alfieri et al. 2015) and extreme precipitation (Lehtonen et al. 2014), there is a strong need to model compound floods and produce future projections. Recent studies have made first such projections (Kew et al. 2013;Arns et al. 2017;Bevacqua et al. 2019).
However, there has not yet been an analysis on whether existing climate and hydrodynamic models are capable of recreating the dependence between compound flood drivers over larger domains. Presently, models for the different drivers are calibrated and validated individually. Though individual performance is satisfactory, this might not be true when the outputs are merged to derive compound events. This aspect is important if projections of future compound floods are to be robust.
In this study, we evaluate the ability of high-resolution pan-European climatic and hydrological models to reproduce the dependencies found in observations. Specifically, we compare the dependence measures (mainly upper tail dependence coefficient) computed on the basis of three sets of data: observations, reanalysis and hindcast. Observations come from a set of river, coastal and wave gauges as well as gridded interpolated rain gauge measurements. Modelled data consist of data sets created using a climate reanalysis, in which the model output is corrected at each timestep with observations, and a climate hindcast, in which the model is run based on a set of initial conditions but is not corrected during the run. Output from a hydrological model of European rivers and a hydrodynamic model of European coastal waters was also used, driven by the available climate data sets. We also identify compound events using reanalysis data, investigate their plausibility in terms of historical records of flood impacts and search for historical cases of compound floods in the modelled data.

Domain and data
Data sets collected for this study are summarized in Table 1. Fundamental for the analysis was direct measurements (observations). Hourly records of sea levels were taken from 156 gauges, the same as used in Paprotny et al. (2016). The tidal component was removed from the data through a harmonic analysis. A skew surge approach was then applied, in which the surge height is the difference between the predicted astronomical high tide and nearest observed high water. This gives more certainty than using the residual directly, as any difference in timing of the predicted and actual tide creates an "illusory" surge (Batstone et al. 2013). For detailed sources of the sea level data and information on how the data were processed, we refer to Paprotny et al. (2016). Records of daily river discharges were collected from 1791 gauges and are the same as in Paprotny and Morales-Nápoles (2017). Daily precipitation totals were drawn from the E-OBS v16.0 data set, which is a gridded interpolation of measurements taken at weather stations with a 0.25° resolution (Haylock et al. 2008). Finally, significant wave heights from 48 buoys were taken from Vousdoukas et al. (2017). It should be noted that significant wave height is traditionally defined as the average of the one-third highest individual waves (Vanem 2016).
The comparative model data were taken from several sources. Both storm surges and river discharges were drawn from high-resolution pan-European models with better accuracy compared with other models, as shown in (Paprotny et al. 2016(Paprotny et al. , 2019Rojas et al. 2011). Daily river discharges in a gridded, 5-km network were obtained from the European Flood Awareness System (EFAS), which utilizes the Lisflood hydrological model (Alfieri et al. 2016). In the reanalysis, the model was forced by gridded meteorological observations at 5 km resolution, EFAS-Meteo (Ntegeka et al. 2013), rather than a climate reanalysis data set. Sub-daily storm surge heights were simulated by Paprotny et al. (2016) using Delft3D (Deltares 2014) with a 0.11° regular rotated-pole grid (approx. 12.5 km) driven by wind and air pressure data from the global ERA-Interim climate reanalysis (Dee et al. 2011). Precipitation amounts were taken directly from ERA-Interim, which has a 0.75° resolution. Lastly, sub-daily significant wave heights were obtained from WaveWatch III simulations (Tolman 2002) driven also by ERA-Interim. The simulation was carried out by Mentaschi et al. (2017), and the results are available per 25 km coastal segments. Further evaluations of the storm surge and river discharge models are presented in Paprotny et al. (2019) and Rojas et al. (2011Rojas et al. ( , 2012. As ERA-Interim is mostly used here as model driver, we will call this group of data sets "reanalysis" for brevity, even though technically only ERA-Interim is a reanalysis, i.e. the model output was corrected at each timestep with observations. The reanalysis data are supplemented by simulations utilizing hindcasts from regional climate models within the EURO-CORDEX framework . Those highresolution models are also used to generate future projections; hence, their accuracy in reproducing the dependence structure is important. To be comparable, the different variables would need to originate from the same climate model run. Here, storm surge and river discharge simulations, made with Delft3D and EFAS, respectively, were only both available for the RCA4 regional model (Strandberg et al. 2015) coupled with the  Paprotny et al. (2016) and Alfieri et al. (2015). Precipitation was obtained directly from the RCA4 historical run at 0.11° resolution . An evaluation of EURO-CORDEX precipitation data is presented by Kotlarski et al. (2014). The study area is the same as in Paprotny et al. (2016), which is comprised of all European coasts with the exception of outlying islands and majority of Russian and Ukrainian coastlines. Tide gauges were connected with the nearest E-OBS grid cell. River gauges were connected with the nearest tide gauge with sufficient data for comparison (see the next section), with the search radius limited to 200 km, as in Ganguli and Merz (2019a). Grid cells of the Delft3D model were connected with the geographically nearest grid cell of the other data sets, including tide gauges. A different method was only used for linking river gauges and EFAS grid cells. For each river gauge, an EFAS grid cell with the smallest difference in catchment area within 10 km radius was considered corresponding to the river gauge in question.
Last but not least, historical information on flood impacts was gathered. A list of 12 compound floods since 1979 with quantified impacts was drawn from the HANZE database Paprotny et al. (2018). Compound events identified in the reanalysis data (see the next section) were validation by consulting multiple resources. Relevant information was obtained from Paprotny et al. (2018), Consiglio Nazionale delle Ricerche (2019)

Defining compound floods and dependence measures
The first step of assessing dependence between drivers of compound floods was generating a series of storm surge events. Due to the resolution of precipitation and river discharge data, we use daily maximum storm surge heights and maintain this resolution throughout the study notwithstanding the availability of finer data. Surges were identified by setting a 90th percentile threshold of the daily maxima and then aggregating consecutive daily occurrences of surge heights above the threshold. Each storm surge event is described by the maximum surge height during the event. Due to the possible inaccuracies resulting from the different distances between gauges and grid cells of different model data sets, additional 1 day was added to the beginning and ending of each storm surge event. Then, maximum precipitation amount and river discharge occurring during each storm event window was calculated. The above-mentioned procedure was done three times, i.e. using sets of observations, reanalysis and hindcast data. Only locations with at least 30 storm surge events with corresponding precipitation or discharge data were used for further analysis and are shown in the results.
Additionally, we analysed the effect of adding wave heights to storm surges, by merging available series of observations and reanalysis of surge heights with 20% of significant wave height. The 20% value is "considered to be a reliable approximation of the wave setup, i.e. the elevation in mean water level near the coast due to wave shoaling and breaking" (Vousdoukas et al. 2016b). Then, the procedure of identifying storms and connecting with the other data sets from the previous paragraph was repeated. We did not, however, analyse the dependence between surge and waves due to a very limited number of overlapping observations (available for 8 tide gauges). Still, as the number of wave observations is small also for combined wave and surge data, we only describe this in the discussion.
Once the data were prepared, we calculated the dependence between pairs of variables. Several measures have been used in compound flood studies, including Kendall's (Wahl et al. 2015;Hendry et al. 2019), Coles' Jones 2002, 2004) and Spearman's rank correlation (Couasnon et al. 2019). The primary measure used here is the upper tail dependence coefficient (UTDC), which is a nonparametric statistic. To compute it, we use the Capéraá-Fougéres-Genest estimator as in Ganguli and Merz (2019b). For brevity, we will simply denote this metric as : where u and v are the cumulative empirical distributions of the pair of variables in question (Frahm et al. 2005). Positive values of indicate that there is dependence in the upper tail of the distribution. Only this metric is featured in the figures, but two other measures, namely Spearman's and Kendall's correlations, are also discussed in the results. Importantly, whenever the difference between the modelled and observed dependence is shown in figures or discussed in the text, it was calculated using identical time series length. This adjustment was made by censoring both the modelled and observed according to their exact availability in a given station.
Selection of events in the reanalysis data was done using three pairs of variables: surge heights (from Delft3D/ERA-Interim run) with precipitation (from ERA-Interim), surge with river discharge (from EFAS/EFAS-Meteo run) and surge with river discharge for rivers with a catchment area of at least 500 km 2 . For consistency with discharge and precipitation data, the storm surge events were aggregated here from daily average surge heights rather than maxima. Those two variants of surge-discharge compound event were created as the majority of coastal segments from Delft3D connected with very small catchments, often the minimum resolution of the EFAS model ( 25 km 2 ), hence representing more like joint coastal-pluvial flood occurrence than a coastal-fluvial flood combination. Therefore, the connecting procedure was reversed here, and 621 estuaries of rivers with catchments of at least 500 km 2 were joined with data from the nearest coastal grid cell of the Delft3D model. Additionally, we used available observations of surge/precipitation/discharge and extracted compound events with the same methodology.
Compound floods in the reanalysis data were identified by first selecting those storm surge events (at each individual grid cell/tide gauge) which exceeded at any point of time a 5-year return period. To calculate the return period, we apply an extreme value analysis to the surge series using peak-over-thresholds approach, which is preferable to block maxima in application to water levels (Arns et al. 2013). We use generalized Pareto distribution, as follows: where is the location parameter, is the scale parameter, is the shape parameter and p is the probability of occurrence. is a threshold value defined manually; it was set to the 90th percentile, as used to extract surge events. The extreme value analysis was repeated for precipitation and river discharges. Any occurrence of precipitation or discharge with a 5-year return period or more during an extreme storm surge event (including the additional ± 1 day window) was considered a compound flood.

Results
In this section, each subsection is accompanied by a figure with two panels; in each case, the left one presents the surge-precipitation combination, and the right one the surge-discharge combination. The temporal resolution is daily, and when the dependence measures are shown or discussed regarding the model data, there are for the same combinations of gauges or grid points as in the observations.

Dependence structures in observational data
Dependence between storm surge heights and precipitation amounts is presented in Fig. 1a. Strongest dependence ( of 0.15 and more) is observed in the west-facing coasts, such as western Great Britain, the Netherlands, western Norway, and western Iberia. Along the coasts on the opposite side (eastern Great Britain and Spain), the dependence is mostly weaker, with between 0.05 and 0.15. Almost all stations in the Baltic Sea show very similar values (about 0.15). The dependence is much weaker in the Mediterranean Sea, though the availability of long data series is limited. Among the stations with the highest in the region are Venice and some locations in southern France; both areas recorded compound floods in the past. Kendall's and Spearman's correlations differ substantially from the upper tail dependence coefficient and are higher on average by 0.1 and 0.18, respectively.
Dependence between storm surges and river discharges shows many similarities (Fig. 1b). The highest is also observed in west-exposed coasts from Scotland through France to Portugal, often above 0.25. In Scandinavia, higher values are also noticeable along the main storm corridor in Europe, which passes through Denmark and southern Sweden. The correlation visibly weakens moving north through the peninsula, turning negative across Lapland. In Great Britain, lowers moving east. Along the Mediterranean there are almost no stations with sufficiently long overlapping tide and river gauge records. Correlation measures show again higher values than , but in contrast to surge-precipitation events, the distribution of low and high values is very similar.

Dependence structures in reanalysis data
The reanalysis data show, on average, smaller upper tail dependence coefficient than was computed with the observed surge and precipitation data (Fig. 2a), if the same data length is considered (0.092 instead of 0.095). Overestimation is mostly found in north-western Europe, while the correlations are underestimated in southern Europe. Smallest differences (below 0.05) were identified in Scandinavia and central Great Britain. The largest errors in both directions were found around the straits of Gibraltar and Otranto. All three correlation measures show similar differences between observations and the reanalysis (Table 2). Kendall's has the lowest error and bias out of the three measures. The absolute values of from the reanalysis show far more even distribution than in the observations (Supplementary Figure S1a), with stronger contrast between west-and east-facing coasts in northern 1 3 Fig. 1 Upper tail dependence coefficient (UTDC) between observed storm surge events and a precipitation; b river discharges. River gauges are connected to the nearest tide gauge with sufficient data up to 200 km radius 1 3 Fig. 2 Difference in upper tail dependence coefficient (UTDC) between reanalysis and observed data, for the combination of storm surge events and a precipitation; b river discharges. The connections between river gauges and tide gauges from observations are preserved here for comparability with Fig. 1. The modelled and observed data were censored to a common series length 1 3 Europe. Importantly, the correlation in the Mediterranean is shown as similar to that in north-western Europe.
Using the reanalysis for the combination of surge and discharge data results, on average in lower dependence values (Fig. 2b), if the same data length is considered (0.07 instead of 0.10). The R 2 between observed and modelled is high (0.61) and even better for r and . However, the mean error and bias are higher than for surge-precipitation pairs. The negative bias is the strongest in Scandinavia, as well in south-east England. Some strong positive bias is mostly noticeable for continental Europe and Iceland. For most of other gauges, the differences are small. The absolute values of from the reanalysis show similar distribution compared with observations (Supplementary Figure S1b), but with much stronger contrast between northern and southern Scandinavia and different coasts of Great Britain. The reanalysis also provides data for the Mediterranean region missing in the observations; the dependencies are weaker compared to coasts exposed to the Atlantic Ocean (below 0.1 rather than above 0.2), especially in the Alps, where they turn negative.

Dependence structures in hindcast data
Hindcast data are slightly better in reproducing the combination of surge and precipitation data than the reanalysis (Fig. 3a). However, there is a lack of observational data in southern Europe within the timeframe of EURO-CORDEX hindcast (1970-2005. The contrast in the accuracy of the two correlation measures and is greater than in the reanalysis, with R 2 between observed and modelled correlation being almost 0.5 and the overall bias very low. The absolute values of from the hindcast show similar, rather even distribution along the different coasts (Supplementary Figure S2a). Only North Sea coasts show noticeably lower dependence (by about 0.1), while in some locations in the Mediterranean the dependence is stronger compared with reanalysis, let alone observations ( above 0.25 in many locations). Table 2 Comparison of dependence measures obtained from observed data with reanalysis (Rean.) and hindcast (Hind.) data, by compound event type and error metric UTDC is the upper tail dependence coefficient. Mean indicates the average value of the dependence measure. R 2 is the coefficient of determination, MAE is the mean absolute error, and MBE is the mean bias error between observed value of a given dependence measures and the value obtained for the same locations in the reanalysis or hindcast. A common series length was used for observed and modelled data for comparability. Results are not fully comparable between reanalysis and hindcast due to different observed data availability per given timeframe  The connections between river gauges and tide gauges from observations are preserved here for comparability with Fig. 1. The modelled and observed data were censored to a common series length Quite different results were computed for the combination of surge and discharge data (Fig. 3b). Accuracy of the hindcast is this time slightly worse compared to the reanalysis. There is larger error and stronger negative bias. The dependence measured by is underestimated across Scandinavia and the British Isles, though there are some pockets of multiple stations with strong positive bias. The results are very similar to the reanalysis for the European continent proper. Kendall's is reproduced with less bias or error than or r. As in the reanalysis, the absolute values of from the hindcast are similarly distributed compared to the values obtained from observations (Supplementary Figure S2b). Yet, the contrast between northern and southern Scandinavia and the west-and east-facing coasts is far more stark. The hindcast indicates slightly weaker dependence between surge and river discharge than the reanalysis.

Identification of compound floods in reanalysis data
Reanalysis data were used to identify compound floods between 1979 and 2014 (surge-precipitation events) or between 1990 and 2013 (surge-discharge events). A total of 60 events were identified: 22 surge-precipitation events, 29 surge-discharge events (of which 9 involving large rivers with catchments of 500 km 2 or more) and 7 events combining high surge, precipitation and discharge (of which 4 involving large rivers). During 17 events, the return periods of each driver were 10 years or more at least for part of the affected area; for the remaining events, the return periods of each driver were 5 years or more. The total length of the 60 events was 73 days. Most of the modelled surge-precipitation events were located mostly in the western coasts of the Iberian Peninsula and France, with scattered events in the Mediterranean Sea and very few occurrences in northern Europe (Fig. 4a). Surge-discharge show many occurrences along the western coast of Great Britain, Baltic and Aegean sea coasts and in western France. Events involving large rivers were exclusively indicated in southern Europe (Fig. 4b).
The full list of events, their details and uncovered historical observations on their possible impacts is provided in Supplementary Table S1. Below, we discuss the results grouped in three categories: true positives, which are events indicated as compound in the reanalysis, and for which at least evidence of a single-driver flood could be found; false positives, for which no evidence of a flood was traced; and false negatives, which are compound events known from gauge records and/or flood damage reports, but not found in the reanalysis.
True positives Out of 60 events, reports of floods and/or damages were found for 41, of which for 33 flood damage was reported. Three major compound floods were found in the reanalysis data (described in Paprotny et al. (2018)). Flood in Venice on 31 January-1 February 1986 combined storm surge, heavy rainfall and river discharge (the latter known only from reports, as the period is not covered by the reanalysis discharge series). Five fatalities and 37 million euro losses in 2011 prices were reported. Then, the reanalysis shows a compound event (with both heavy precipitation and high discharge) on 25 December 1999, when storms "Lothar" and "Martin" swept through the UK and France. A total of 1200 people were affected in the UK, while in France there were 17 fatalities, though mostly as a result of the windstorm. Finally, the reanalysis correctly indicated as compound the 2004 flood which affected southern Ireland with 50 million euro damages. A compound event in 17 December 1997 in southern France was not shown by the reanalysis within the selected threshold, but it shows on that date a compound event in Spain and Portugal; Spain was indeed affected by a fluvial flood, with some houses damaged. Fig. 4 Compound flood events identified in the reanalysis data, for the combination of storm surge events and a precipitation ; b river discharges . The colours indicate the number of days with occurrence of compound events (some events lasted more than 1 day). A compound event is defined here as the occurrence of a storm surge event with a maximum of daily average height at least equal to a 5-year return period, during which (with an additional ± 1 day window) daily precipitation or river discharge peaked at a value at least equal to a 5-year return period For other events, damage reports vary in detail and extent; sometimes, they pertain to a slightly different area than indicated by the reanalysis. In general, the reports indicate floods caused only by extreme rainfall or high discharge, occasionally coastal floods driven by storm surge only. Many flash and fluvial flood reports could be found for southern Europe on the dates indicated by the reanalysis. Six events were also discovered in the tide/rain/river gauge records. In one case, all three variables (surge height, precipitation, river discharge) exceeded a 5-year return period in observations from the A Coruña Province, Spain. This event has the highest spatial extent from all surge-precipitation events modelled in the reanalysis, covering northern coast of Spain and western France. However, information on impacts of these events is scarce; only for Spain, some fluvial flood damage to houses and infrastructure is indicated. Other compound events found in both hydrological observations and reanalysis include a surge-discharge event along the Danish coast next to Skagerrak and Kattegat straits, which was recorded by gauges on the opposite coast of Sweden; a coastal flood in Estonia (more than 3,000 people affected, 1 fatality) which is shown as compound by both the reanalysis and Swedish gauges; 2006 flood in Ireland and UK recorded by British gauges; 2012 flood in Latvia recorded by Swedish gauges.
False positives For 22 events indicated in the reanalysis, no damage reports were found, though for three events some storm damage was mentioned by sources. In five cases, the lack of impact information could be explained by very limitation habitation along the coast in question, as some compound events are shown for remote parts of Iceland and Norway. Several other events might not have led to any impacts due to low intensity of surge, rainfall or discharge despite a 5-year return period or more. This is mainly the case of southern Europe, where storm surge heights are small, and the reanalysis highlights compound events with surges of only 30-40 cm. In other situations, discharge or rainfall is hardly in the range capable of making more than very localized damage.
Still, the lack of impact reports is particularly noticeable for compound events involving rivers with catchment area of 500 km 2 . In most cases, there are either no reports, or reports pertain to flash floods, or different river system on the same date. Particularly, he reanalysis indicates that two events, in 1996 and 2009, exceeded even the 10-year return period on each margin and occurred in many locations in Italy and Greece, but there are no records that those compound floods happened in real life. This could be again put to the limited habitation of those rivers located in southern Europe and rather small intensity of the drivers (surge and discharge) in those parts of the continent.
False negatives Available surge and precipitation observations have shown only three compound events, of which only one was shown in the reanalysis (Spain, 1987). However, both of the other two events were not found in historical flood damage reports, likely due to the limited amount of precipitation involved. As for surge-discharge observations, 22 events were identified, mostly for Sweden and the UK, for which most extensive data are available. As noted previously, six events could be also found in the reanalysis. The biggest event missed by the reanalysis was a compound flood on 5 January 2001, which caused damages mostly in Brittany (north-western France) and affected southern UK. 5-Year return period was exceeded in three tide gauges and 16 river gauges. The reanalysis does not show also an event is Sweden in January 2007, where over a week-long period the 5-year return period was exceeded in three tide gauges and 10 river gauges. Still, for some events no flood impacts were recorded, including the aforementioned 2007 event. Again, low intensity of the event in countries with rather good flood protection (Norway, Sweden and especially UK) could explain this lack of records.
Nine out of 12 compound floods from HANZE database were not included in either observations or the reanalysis. The former can be explained by the lack of data for most of the events' locations: France (1997, 2000, 2005, 2006), Ireland (2009), Italy (2008 and Poland (2009). The French flood in Brittany in 2000 was part of a long series of inundations that battered the region over the autumn and winter of 2000/2001: Its November phase was captured by the reanalysis, while the climax in December 2000 was not, and the January 2001 event was found at several gauges, as described in the previous paragraph. Three other French events are known to have involved rainfall exceeding 200 mm in 24 h, while the storm surge component is well described for the floods in the other countries.

Discussion
The comparison between observed and modelled dependence for different compound flood drivers has shown many satisfactory results (similarities in dependence estimates some regions, historical compound floods traced in reanalysis data) and many bad results (discrepancies between models and observations in other regions, several false positives and false negatives). Firstly, we analyse the data further to investigate one major potential source of inaccuracy, i.e. the difference in arrival time of high sea levels and river discharges compared to timing of the triggering meteorological event, caused by deficiencies of the hydrodynamic models used. Secondly, we look at the effect of waves on some of the results of the analysis. Finally, we discuss other limitations and sources of uncertainty.

Time lags in compound events
The number of days between the occurrence of a storm surge event, and the maximum precipitation/river discharges that result in the highest value of the upper tail dependence coefficient , is referred to hereafter as the "time lag". Apart from being useful to investigate the hydrodynamic model's performance, it is also informative as simultaneous or almost immediate co-occurrence of different flood types is perceived by some as only possible mechanism of compound floods (Svensson and Jones 2004;Petroliagkis 2018;Ganguli and Merz 2019a, b). Theoretically, an area might be affected, e.g. by a coastal flood, and therefore have a reduced resilience to a fluvial flood occurring some time later, leading to more serious consequences than during a stand-alone occurrence. Also, the time lag in arrival time of surges, precipitation and discharges indicates whether the same storm events have the potential to cause several phenomena in a short time. Figure 5 shows the time lags found in observations. Negative values indicate that the maximum precipitation or discharge arrives before the storm surge. Indeed, this is the case for precipitation maxima along most of north-western Europe (mostly 1-3 days). Still, the record is rather mixed, which in the Atlantic coasts can be attributed to a high frequency of storms in the winter period, which might lead the computation to capture rainfall or discharge coming from a different storm than the surge. Observations for river discharges show positive lags (i.e. discharges arrive later than the surge) in most locations in the continent proper (often of more than 5 days), as it takes time to generate run-off from precipitation, especially in larger rivers. In the Baltic sea, especially in the northern parts, discharge mostly arrives before the surge due to the long delay of storm surges arriving through the Danish Straits. Many small catchments in the UK also have slightly negative or no lag to the storm surge event.
Reanalysis data (Fig. 6) give different lags in many locations; only in one-third of stations the difference in lag with precipitation does not exceed 1 day. In the case of river 1 3

Fig. 5
Time lag in occurrence of storm surge events and a precipitation; b river discharges, that results in the highest value of the upper tail dependence coefficient (UTDC). River gauges are connected to the nearest tide gauge with sufficient data up to 200 km radius 1 3 Fig. 6 Difference in the time lag in occurrence of storm surge events and a precipitation; b river discharges, that results in the highest value of the upper tail dependence coefficient (UTDC), between reanalysis and observed data. The connections between river gauges and tide gauges from observations are preserved here for comparability with Fig. 5. The modelled and observed data were censored to a common series length discharge, the consistency is even lower. However, the uncensored reanalysis results show somewhat smoother distribution of lags (Supplementary Figure S3), with particularly strong differences compared to observations in south-east UK and northern Sweden. Negative lags (2-3 days) are consistently shown along the west-facing coasts for surge-precipitation pairs. Results of the hindcast (Fig. 7) are similar, while it is even closer to the distribution of lags that would be expected in north-western Europe, e.g. discharge arriving earlier than surge in eastern UK and Sweden, in contrast to the western coast of the UK and Denmark (Supplementary Figure S4). Both models show high lags in river discharge along the Spain and French coasts with the Mediterranean Sea. Overall, the models differ most from the observations in Scandinavia, as it is a difficult region for both river discharge (large presence of natural reservoirs such as lakes and marshes) and storm surge (movement of water through the Danish Strait and complex coastline) modelling.

Wave component
Several studies considered the dependence between storm surges and waves as compound events (Wahl et al. 2012;Gouldby et al. 2014;Arns et al. 2017;Petroliagkis 2018). Observations are too scarce to compare this combination, but nonetheless the effect of waves on the dependence could be analysed in context of compound floods, as waves are important in studying coastal flood hazard (Vousdoukas et al. 2016b(Vousdoukas et al. , 2018. Available observations allow studying this effect only for south-western UK and southern/central Sweden, plus some locations in Ireland and Norway (Fig. 8a). In almost all river gauges, adding significant wave height series (20% of its value) to storm surge data results in lower . The strongest effect is observed in Sweden, indicating low dependence between surge and wave in this region. The reanalysis data, however, show a different picture (Fig. 8b). In large parts of Sweden, a positive effect on the dependence between the coastal water levels and river discharge is observed. In the UK, the effect is similar to observations, while data for the Mediterranean region show again a reduction in , particularly in southern Spain. This indicates low probability of a multivariate event, even though impacts of such a flood could be significant. An improvement in data availability is clearly needed as well.

Limitations and uncertainty
The accuracy of models in reproducing compound flood is affected by several factors. Some are methodological: The study looks only at a particular threshold for selecting storm surges, a defined window of joining them with precipitation and river discharge series, and a specific threshold in terms of marginal return period for identifying compound events in the reanalysis. Still, lowering or raising the threshold would increase or decrease the number of identified events, but setting it too low would generate many events simply too frequent to create any possible impacts, while setting it too high would result in very few events being identified. Only 17 out of 60 events identified in the reanalysis exceeded a 10-year return period on both margins. A wider window would encompass more precipitation or high discharge occurrences, but reduce the practical implications of such an analysis, as such events would mostly not exacerbate each other or result in "compound" impacts. The study further utilizes daily resolution of the data, as this is the lowest resolution of several data sets used here. Still, sub-daily resolution would be undoubtedly a steeper challenge for models in terms of accurate modelling the timing of the drivers of compound events. 1 3 Fig. 7 Difference in the time lag in occurrence of storm surge events and a precipitation; b river discharges, that results in the highest value of the upper tail dependence coefficient (UTDC), between hindcast and observed data. The connections between river gauges and tide gauges from observations are preserved here for comparability with Fig. 5. The modelled and observed data were censored to a common series length 1 3 Fig. 8 Difference in upper tail dependence coefficient (UTDC) for the combination of storm surge events and river discharges, when 20% of significant wave height is added to the storm surge data, for a observed; b reanalysis data. The connections between river gauges and tide gauges from observations are preserved in (b) for comparability with (a) The amount of observations available varies substantially and is limited particularly in the Mediterranean region. The main metric described here, the upper tail dependence coefficient , is sensitive to the short data series in southern Europe, which explain the large difference between observed and modelled dependence in that part of the continent. In north-western Europe, where long data series are more readily available, there is usually less difference between observations and models. Still, all modelled data sets (and the gridded observed precipitation) involve different resolutions of grids, which creates a possible spatial mismatch between locations of gauges and grid cells. This is partly accounted for by expanding the search window of compound events by an extra day. Other inaccuracies span from the data resolution, which has, for example, limitations in representation of precipitation events in the relatively coarse ERA-Interim climate reanalysis. Also, model grids simplify the coastline in the Delft3D model and sometimes result in inaccurate delineation of rivers and drainage basins in EFAS.
Consequently, performance of different models for their particular variables varies spatially and also partially explains inaccuracies in compound flood representation. Supplementary Figures S5-S7 show the accuracy of individual models. Accuracy of ERA-Interim in estimating precipitation is considerably lower in southern Europe, where the highest difference in is also indicated. Interestingly, ERA-Interim is better in modelling precipitation in west-facing coasts (more exposed to storms and more prone to compound events) than east-facing ones. Further, Supplementary Figure S6b shows also mismatch between two observational data sets of daily precipitation, E-OBS and EFAS-Meteo; the latter is used for the reanalysis of river discharge in EFAS. Very different resolutions result in different precision of precipitation data and introduce errors in matching different grid points. Large differences between observed and modelled discharges are particularly noticeable in Scandinavia, where there also the highest differences between observed and modelled dependencies for pairs of variables, as well as their time lags. The same happens in the Iberian peninsula; in both cases, existence of many reservoirs (natural ones in Scandinavia and artificial in the Iberia) reduces the accuracy of modelled discharge data. Many compound floods were wrongly indicated by the reanalysis in southern Italy and Greece. This can be attributed not only to low intensity of the events (as suggested earlier in the paper), but also to visibly poor performance of all models involved, particularly for precipitation and storm surges.
One important local factor omitted from this analysis is tides. A high tide can contribute significantly to a compound event (like in London in 1928), but tides are, barring for nonlinear effects on local sea level (Sterl et al. 2009;Rego and Li 2010), an independent component. Additionally, they need to be analysed with a good temporal resolution, in contrast to daily data utilized here. Nonetheless, the inclusion of tides in a sub-daily "reanalysis" of past floods would have possibly indicated more events in north-western Europe compared with the results discussed above.

Conclusions
Compound floods have been studied so far in many settings, with a very wide range of definitions, variables, dependence measures, thresholds, time windows, spatial scales or data sets. There is currently no agreed way how to calculate compound flood hazard, or how to evaluate model performance in this context. This study has analysed and discussed whether existing high-resolution models can reproduce the dependence between the drivers of compound floods. This would be important in the context of making predictions of changes in the probability of compound event occurrence under climate change.
The study has shown strong dependencies in surge-precipitation and surge-discharge pairs along many north-western coasts of Europe. The performance of models driven by ERA-Interim reanalysis and RCA4 hindcast was also rather satisfactory in this region, though with some overestimation of surge-precipitation dependence was found mainly around the English Channel and North Sea. In southern Europe, the surge-precipitation combination was not properly reproduced (strong underestimation), though relatively small amount of observations is available in this region. On average across Europe, the surge-precipitation dependence was narrowly overestimated by models, while the surge-discharge dependence was underestimated. The surge-wave dependence was not discussed due to the limited number of observations, but a strong reduction in dependence between combined surge/wave and discharge was shown.
Past compound events obtained from the reanalysis had at least some plausibility based on historical flood and damage report in two-thirds of cases. Otherwise, low population density, limited data availability for certain countries or low intensity of the identified event could sometimes explain the occurrence of apparent false positives. However, several large historical compound floods were missing from the reanalysis (false negatives). Nonetheless, the results show that a data set from independent simulations still has the potential to capture a large portion of the dependencies between different compound flood drivers. This gives at least some degree of confidence in the possibility of making predictions of compound floods under climate change.
Data underlying the results presented here are accessible on figshare (https ://doi. org/10.6084/m9.figsh are.11400 561), except for the list of past events identified in the reanalysis, which is contained in Supplementary Information 1.