Multivariate return period for different types of flooding in city of Monza, Italy

The return period is a probabilistic criterion used to measure and communicate the random occurrence of geophysical events such as floods in risk assessment studies. Since an individual risk may be strongly affected by the degree of dependence amongst all risks, the need for the provision of multivariate design quantiles has gained ground. Consequently, several recent studies have focused on estimation of multi-hazard risk resulted from different hazard types. In this study, multi-hazard return periods are derived for riverine and pluvial floods in city of Monza, Italy, based on different copula dependence structures. It is shown that ignoring statistical dependence among different inter-correlated hazards may cause significant misestimation of risks.


Introduction
The concept of "Multi-hazard" is defined by the European Commission, in 2010 as different hazards either occurring at the same time or shortly following each other, because they are dependent from one another or because they are caused by the same triggering event or hazard, or merely threatening the same elements at risk without chronological coincidence. This definition implies considering possible interactions and/or cascade effects between different possible hazardous events threatening a common area (Liu et al. 2015).
Many regions of the world have complex hazard landscapes, wherein risk(s) from individual and/or multiple extreme events is omnipresent (Pourghasemi et al. 2020). Hence, greater attention has been given to advancing the theory of assessing risk from multiple hazards in recent years.
Several studies gain multivariate statistics for return period estimation based on different attributes of extreme events to assess the induced risk, comprehensively (Volpi 2019). However, it has had limited application to assess the risk of different types of hazards. Current approaches may consider temporal interactions between hazards. In the simplistic approach, the risk associated with multiple hazards is evaluated through adding up all the probable risks within an area, ignoring interactions between hazards (e.g. Carpignano et al. 2009). In another approach, occurrence of a hazard could change likelihood of occurrence of another hazard(s) (e.g. Bell and Glade 2012;Kappes et al. 2012;Kreibich et al. 2014). However, it is usually difficult to quantify association between hazards over certain areas due to rarity of extreme events.
Three main types of floods include pluvial, riverine and coastal floods. pluvial floods are caused by extreme rainfall events during which water cannot be sufficiently processed by existing drainage systems (Falconer et al. 2009;Yin et al. 2015). Riverine floods are floods due to overflows from large rivers. They generally cause more loss in properties while pluvial floods cause more life losses (Tockner and Stanford 2002;Maggionia and Massari 2018). Coastal floods occur when normally dry, low-lying land is flooded by seawater.
Since urban areas have exposed elements of much higher significance in comparison with rural territories, flood hazard assessment in urban area is off great importance. Several recent studies are concerning flood hazard assessment in urban areas. However, few recent examples in the literature consider the hazard due to different flood types in these areas (Leitner et al. 2019;Galuppini et al. 2020;Jiang et al. 2019), while not discussing the hazard of concurrent multi-source floods.
To assess hazard of concurrent riverine and pluvial floods, a simple framework is introduced in this paper. Since floods are usually triggered by extreme precipitation, the proposed framework employs tail dependence of the joint bivariate distribution of precipitation at upstream and at the city of Monza. Summed values of two upstream rain gages are assumed to be representative of riverine floods and the values of the rain gauge at the city are assumed to be representative of pluvial floods. Based on the tail dependence, we examined different dependence structures between depth of riverine floodwater and pluvial floodwater and computed joint return period for different combinations of return periods of riverine and pluvial floods.

Study area
Monza is a city in Northern Italy with a population of about 124,000 inhabitants. It is the third most populous urban center in the Lombardy region. It features a total surface of about 33 square km and has an average altitude of 162 m above sea level (Galuppini et al. 2020). Monza is crossed from north to south by the river Lambro. There is an artificial fork of the river known as Lambretto which rejoins the main course of the Lambro as it exits to the south, leaving Monza. Monza periodically suffers from inundation due to overflows from the Lambro river and from its channel Lambretto. Figure 1 illustrates the study area. Monza city (indicated by star in Fig. 1) can be affected by two flood sources: flooding from a river, and pluvial floods due to inadequate drainage in the city area itself. To evaluate the flood risk from river, as well as the risk of inundation, we try to define the joint distribution of flood water depth due to the two hazards. For the sake of simplicity, it is assumed that their correlation is reflected in tail dependence of rainfall events at the city and the summed value of two upstream rain gauges (Cremella and Costa Masnaga) next to the Lambro river.

Data
Data used in the study consists of 6-h precipitation obtained from three rain gauges: Monza city in the period of 1987 to 2005, Cremella in the period of 2003 to 2007, and Costa Masnaga in the period of 1997 to 2005. Values of two upstream rain gages, i.e. Cremella and Costa Masnaga, are summed to be representative of the upstream subbasin (respectively, P 1 and P 2 in Fig. 1). Since available historic rainfall data are too short for adequate analysis of the system, a multisite precipitation data generator is used to extend the 6-h precipitation data at the sites.

Copulas
A convenient way to deal with multivariate variables, when they are generally dependent, is to use Copulas (Nelsen 2006). Copula functions allow to separate the specification of marginal distributions and their dependence structure and describe it as ..d is the joint probability distribution for variables x 1 , … , x d , and C 1…d is the copula function (Sklar 1959;Durante and Sempi 2015).
Empirical estimate of copula ̂ 1…d is defined as a rank-based empirical measure of joint cumulative probability (Genest and Rémillard 2004). For the bivariate case and a sample of size n from a continuous bivariate distribution (Zhang and Singh 2019)

Joint return periods (JRPs)
Let's start with the simple bivariate case where the overall space is divided into 4 quadrants (Fig. 2).
A multivariate copula-based framework can be used for various definitions of multivariate hazard scenarios (e.g., AND, OR, Kendall and Survival Kendall); whether to each of them depends on what situations will trigger the failure of the structures or system under study (Salvadori et al. 2011(Salvadori et al. , 2013. In the OR case, i.e. the case of this study, it is sufficient that one of the variables X 1…d exceeds the corresponding threshold x i (sum of 2nd, 3rd and 4th quadrant of Fig. 2). (1) (3)

Tail dependence index
The concept of tail dependence index provides a robust tool to measure the intensity of dependence in the lower/upper tail of a bivariate distribution. Since we deal with fat-tail distributions, the bivariate returns X 1 and X 2 are transformed to unit Fréchet margins S and T (Schmuki 2008).
where F X 1 and F X 2 are marginal distribution functions for X 1 and X 2 . Variables S and T have the same dependence structure as X 1 and X 2 , and are on a common scale. Using conditional probability, the dependence index in the upper tail region of the distributions ( u ) may be expressed as According to Fig. 2, the equation can be rewritten as S and T have common scale, meaning that events of the form S > s and T > s for large t, correspond to equally extreme events for each variable (Poon et al. 2003). Hence, we can rewrite the last equation as Since T and S are injective and monotonically increasing transformations of respectively X 1 and X 2 , Replacing limits in Eq. (9) with specific proportion (r) enables empirical estimates of . Here, we use the sample version of u , based on n pairs of data (X 1 , (Bormann and Schienle 2020). Values of lie in the interval [−1, 1] . If X 2 is a strictly increasing function of X 1 , then = 1 , and when X 2 is a strictly decreasing function of X 1 , then = −1 for all values of u (Reiss and Thomas 2007).

How to specify the extreme value threshold
Since is a quantile-dependent measure of dependence, assuming a threshold quantile above (or below) which values are considered as extremes is required. It is assumed that extreme events above the threshold cause flooding. There are different methods to calculate . Frahm et al. (2005) suggested an interesting kernel plateau-finding heuristic procedure to estimate the optimal extreme value threshold. In this method, the optimal plateau is estimated in four steps: (1) a kernel box with a bandwidth of b (e.g., b = int(0:05n)) is selected; (2) the n values of for any r of 1∕n, 2∕n, … , 1 is calculated; (3) for box kernel with bandwidth b, the corresponding values fall within each box are calculated, results in n-2b average values ( ); (4) a moving plateau with a length of l = int( √ n − 2b ) is defined as a vector of p k = ( k , …, k+l−1 ) where k = 1,…,n-2b-l + 1. (5) the algorithm stops at the first plateau fulfilling the criteria where σ represents the standard deviation of average values 1 ,…, n−2b . The tail dependence index estimate is set to If no plateau fulfills the stopping condition, the tail dependence index is estimated zero (for more detailed description, the reader is referred to Frahm et al. (2005) and Peng (1998)).

Precipitation data generation
Since the available historic rainfall data are too short for adequate systems study, a multisite precipitation data generator is used to extend the 6-h precipitation data at the sites. The algorithm consists of an efficient stochastic generation of multi-site daily precipitation data, using the spatially correlated random numbers approach and a non-parametric modified KNN model to disaggregate daily to 6-h data. The data generation algorithm, which is a modified well-known Wilks' approach originates from the work of Brissette et al. (2007).
Exponential, gamma, log-normal, normal, Weibull and Gumbel distributions were considered to describe the daily data. Gumbel was selected as the best-matching distribution for the data. Hence, using the data generator algorithm and based on the statistical characteristics of the observed data, 1000 years of daily precipitation at all three sites of the study were simulated using Gumbel distribution.

Disaggregation of daily into 6-h precipitation data
Since design rainfalls are often used for flood risk assessment, we used 6-h precipitation data as triggers of both riverine and pluvial flood in Monza. In order to disaggregate the daily precipitation data into the 4 values of 6-h data, a method based on modified K-nearest neighbor (KNN) algorithm proposed by Nowak et al. (2010) is used. The required information about sub-daily rainfall depth was used in the modified KNN resampling method. Moreover, a three-day rainfall pattern classification is adopted based on the work of Park and Chung (2020). The modified KNN resamples precipitation data only among events with same three-day pattern. The method is applicable for disaggregating values of daily rainfall into any shorter sub-daily fragments.
In the classification scheme, the seven rainfall patterns are presented (Fig. 3). When the rainfall day in the middle is D, rainfall patterns "decrease" from previous day (D − 1) to subsequent day (D + 1) in Types 1, 3, and 5. Rainfall patterns "increase" from previous day (D − 1) to subsequent day (D + 1) in types 2, 4, and 6. Lastly, Type 7 shows the "zero" amount of rainfalls in previous (D − 1) and subsequent (D + 1) days. Following Frahm et al. (2005), the kernel plateau-finding is employed to estimate the optimal extreme value threshold for the bivariate distribution of 6-h precipitation of "Monza" and "Lambro SUM". Consequently, the threshold is used in calculation of the upper tail dependence coefficient of the bivariate distribution. In Fig. 4, the box shows the plateau that satisfies the criteria expressed in Eq. (11) and its corresponding tail dependence index (the box size is not scaled).

Results and discussion
The calculated value of tail dependence coefficient (0.29) is regarded as the value of Spearman's rank correlation coefficient (ρ) between depth of riverine and pluvial floodwater. The value is calculated based on Eq. (12), where values of tail dependence index for each moving window is derived from Eq. (10).
The graphical analysis of the tail dependence is carried out based on the Chi-plot (Genest and Boies 2003). The chi-plot is designed to provide a graph that has characteristic patterns depending on whether the variables are independent, have some degree of monotone relationship, or have more complex dependence structures.
The chi plot depends on the data only through the values of their ranks, showing versus . At each sample point, i is actually a correlation coefficient between dichotomized X 1 values and dichotomized X 2 values. The value λ i is a measure of the distance of data points from the center of the data set. Positive values of λ i belong to the first and the third  Figure 5a shows the scatter plot of data pairs corresponding to 6-h precipitation of Monza and Lambro SUM. Moreover, it includes the chi plot of the same data (Fig. 5b). Chi plots of the first and the third quadrants are shown separately in Fig. 5c and d. For the chi plot in the upper panel, positive dependence of the data is shown over almost all the complete range. However, the bent course of the chi plot indicates less dependence at the tails. Since we are specifically interested in the upper tail of the bivariate distribution, chi-plot of data in the third quadrant is desired (Fig. 5d).
Significant non-zero values of tail-dependence index at large values of λ i in the lower right panel shows strong upper tail dependencies. Moreover, separation of chi plot into the distinct chi plots regarding the first and the third quadrants shows asymmetries at the tail-dependencies, i.e. a stronger tail dependency in the upper tail than the lower one.
In order to model the dependence structure between depth of riverine floodwater and pluvial floodwater, tail-dependent copulas should be deployed. On the other hand, since inter-dependency is higher for large flood events than for small floods, upper tail dependence of the copula family should be stronger than the lower. Therefore, among common copula families in hydrology, Gumbel, Joe and flipped Clayton are selected to be used for multi-hazard return-period of floods. Figure 6 shows counter plots of these three types of copulas, having asymmetric behaviour and strong upper tail dependence.
In order to have a comparison with case of independence between different flood types, different multi-hazard return periods are reported in Table 2.
Comparing values of return period in Table 1 for different copula types, highlights the role of tail dependence structure of copulas in estimating joint return periods. As it is shown in Fig. 6, the same point in marginal probability space yields far larger values for Joe copula than two other copulas. Besides, flipped Clayton copula has a value slightly less than the value of Gumbel copula. As it is shown in Eq. (3), larger copula values produce larger values of joint return period. Hence, Joe copula has the largest values of joint return period in Table 1, and flipped Clayton copula has the smallest values of joint return period which is close to the values of Gumbel copula.
As shown in Fig. 7, the upper tail dependency increases through the Pearson's rank correlation coefficient (ρ). Higher values of the upper tail dependence index (λ upper ) can be interpreted as greater dependence when natural hazard-induced disasters are particularly severe as is the case for riverine and pluvial floods. Flipped Clayton and Gumbel copulas don't show an upper tail dependence as strong as that of Joe copula with the same Spearman's rank correlation coefficient. Joe copula, on the contrary, shows stronger dependency for extreme values than for all values together, i.e. λ > ρ. Hence, Joe copula is the only copula family that can adequately model the contingent nonlinear dependencies between individual hazards and is the most recommended copula family to estimate the joint return period in this case. Comparing values of return period in Tables 1 and 2 indicates that the assumption of independence leads to lower estimated values of return periods which could cause overestimation in multi-hazard risk assessment.  Since the pluvial flooding may be driven by shorter rainfall events (i.e. three hours or less), the study was also conducted using hourly data. The results also highlighted the fact that the naive assumption of independence leads to spurious low values of return periods in multi-hazard risk assessment. The results corresponding to use of hourly data are reported in appendix in ESM.

Summary and conclusion
Comprehensive risk assessment is a crucial element in the insurance business. Flood cover provided by insurance companies usually doesn't distinguish between different flood types in urban areas. However, since different mechanisms cause riverine and pluvial floods, different approaches should be adopted to assess the risk due to each type.
In order to develop the dependence structure of two flood types, we proposed a simple approach based on statistical characteristics of tails of joint distribution of parent precipitation events. 6-h values of precipitation at upstream and at the city of Monza are considered as possible factor which trigger riverine and pluvial flooding, respectively.
In order to build a dependence structure of water depth resulted from two different types of floods, three common families of copula with stronger upper tail dependence are considered. It was shown that in case of incorrect assumption of independence, multi-hazard return period of floods is underestimated which leads to overestimation of the assessed risk due to floods. In case of acquisition of concurrent data on pluvial and riverine floods, it is possible to evaluate the estimation of the joint return periods by each three copula families (i.e. Joe, flipped Clayton and Gumbel) and adopt the best statistically fitted copula. However, usually available data about the concurrent occurrence of such extreme events are rare. Hence, it is not an easy job to select one of them. In this case, using Joe copula is recommended to estimate the joint return periods due to its higher upper tail dependency for the same correlation coefficient.
Usually, available data about the concurrent occurrence of extreme events are rare. Hence, using the available data, quantification of multi-hazard interrelationships is not an easy job. The proposed method is a straightforward and convenient way to quantify multi-hazard correlations using data of parental events, which are more accessible. It could also be used to quantify hazards that result from a combination of drivers. For example, the precipitation and temperature data could be adopted in this framework to monitor and quantify agricultural droughts.
To estimate the concurrent hazard of riverine and pluvial floods, the same temporal resolution should be used for both types of floods. It should be chosen short enough to detect pluvial floods, while long enough to detect fluvial floods. It is a challenging task that needs an expert's knowledge about the basin. we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.