1 Introduction

Urban pluvial flooding is determined by the interaction of the spatial layout of urban drainage infrastructure and the spatio-temporal structure of rainfall (e.g., Smith et al. 2002, 2005; Ramos et al. 2005; Morin et al. 2006; Wright et al. 2013; Yang et al. 2013). Therefore, proper representation of the meteorological forcing of urban hydrologic systems is an essential aspect of predicting the performance of the underlying drainage infrastructure. Simulations that reproduce the space–time patterns of rainfall associated with preferred storm speeds and tracks can be used to improve the performance assessment, operation, and design of urban drainage infrastructure (Singh 1997; McRobie et al. 2013). The goal of this study is to credibly simulate extreme rainfall fields in order to quantify the uncertainty of the urban sewer system under different rainfall and infrastructure scenarios.

A common shortcoming of many efforts to assess pluvial flooding is the use of spatially uniform design storms estimated based on the return period of point rainfall data (e.g., Berne et al. 2004; Zhou et al. 2012; Notaro et al. 2013; Gires et al. 2015). Neglecting the spatial variation in the precipitation field, however, is an oversimplification in many cases and does not capture the variation in flood response to different spatial precipitation distributions, even when the design storm return period is fixed (Wheater et al. 2005; Simões et al. 2015). Until relatively recently, however, the spatial distributions of storms at sub-urban scales were not well measured.

A high spatial–temporal resolution dataset for precipitation is now available for most of the United States and can be implemented in hydrologic analysis of dense urban environments (e.g., Smith et al. 2002; Gourley et al. 2014). While the tools for statistical modeling of point-based extreme events are well-developed (e.g., Coles et al. 2001), extending these tools to model spatial extreme data is an active area of research. Some approaches use rainfall generators to simulate precipitation fields and investigate the urban drainage design using parametric (Willems 2001a; McRobie et al. 2013; Nuswantoro et al. 2014; Simões et al. 2015) or nonparametric approaches (Harrold et al. 2003; Mehrotra et al. 2015). Parametric simulation approaches (e.g. Apipattanavis et al. 2007; Chen et al. 2011) typically require assuming that the precipitation data is multivariate normal and often do not preserve cross dependencies that exist in the data. Existing nonparametric approaches, on the other hand, often do not consider the spatial dependence of the rainfall field between the grids (e.g., Harrold et al. 2003). The non-parametric simulation approach employed in this study has the advantage of avoiding assumptions about the data distribution, while preserving the empirical spatial dependence based on the historic extreme precipitation fields.

In this study, we applied the multivariate simulation method described by Lall et al. (2016) on extreme rainfall fields for the first time. The procedure allows Monte Carlo simulation of multiple variables with differing marginal and joint distributions. We used daily radar-derived rainfall data (spatial resolution of 4 km by 4 km) to identify extreme rainfall fields and imported them into the simulator. The simulated extreme rainfall fields and city infrastructure information were used to compute excess runoff through the Natural Resources Conservation Service (NRCS) approach. We introduced an innovative framework based on simulation approach that accounts for the spatial structure of extreme precipitation. That is, we did not use a parametric spatial model, but instead preserved the empirical dependence between grid cells via the method developed in Lall et al. (2016). We used this simulation model to estimate both the excess runoff under the city infrastructure change (addressing source control stormwater management) and the probability of exceeding the treatment capacity of the city under different rainfall scenarios (addressing end-of-pipe stormwater management). We applied the framework to New York City (NYC) as a case study which has been facing the challenge of frequent extreme weather events, sewer system overflow, and flooding (Spierre and Wake 2010; Environmental Protection Bureau of the NYS 2014). Accounting for spatial structure is especially important for extreme rainfall in NYC given the distinct spatial patterns that has been shown to exist there (Hamidi et al. 2017). The results of this study are beneficial for planners working on stormwater management and the approach is broadly applicable because it does not rely on extensive sewer system information (i.e., catch basins’ exact locations, size and connections of the sewer pipes, etc.) as do many other urban stormwater model (e.g. Pina et al. 2016).

The paper is organized as follows. The study area and data are introduced in Sect. 2. In Sect. 3, the methodology to simulate extreme rainfall fields and compute excess runoff are described. In Sect. 4, the results are presented and discussed. Finally, we provide the conclusions of the study in Sect. 5.

2 Case study

Average annual rainfall in NYC has increased nearly 20 mm in the last century (http://www.nyc.gov/dep) and climate projections indicate the potential for increasingly frequent intense storms (Horton et al. 2010). These facts make the City a compelling case study for urban hydrology (e.g., Rosenzweig et al. 2007; Cherrier et al. 2016). Today, much of the stormwater in NYC flows over impervious surfaces, which cover approximately 72% of NYC’s ~ 790 km2 land area, into roof drains or catch basins located on street and highway curbs and into the sewers (NYCDEP 2012a). More than 60% of NYC’s sewer system is combined, meaning it is used to convey both sanitary and storm flows. During heavy rainfall (or rapid snowmelt) events, combined sewers receive higher than normal flows. This can often result in the discharge of a mix of excess stormwater and untreated wastewater directly into the City’s waterways from outfalls to prevent upstream flooding because the Waste Water Treatment Plants (WWTP) are unable to treat the sewer flows that are more than twice their design capacity. This untreated release is called a Combined Sewer Overflow (CSO). CSOs are a concern because of their negative effects on water quality in local waterways (Cherrier et al. 2016). For example, during 2004–2005 there were 35 CSO recorded at two outfall locations of the Bronx River in the Bronx, NYC (De Sousa et al. 2012).

For the nearly 40% of NYC’s sewer system that is separated, stormwater runoff is discharged directly into the City’s waterways while sewages (e.g., industrial, commercial) are routed to the WWTPs. These portions of the system are running under NYC Municipal Separate Storm Sewer System plan (NYC MS4 Progress Report 2016). In addition to the separated and combined systems, small portions of NYC close to the rivers have direct drainage systems where untreated amounts of water are conveyed directly into waterways. Figure 1a indicates the divisions of NYC sewer system. Figure 1a also shows the location of the study area and the surrounding lands and rivers.

Fig. 1
figure 1

The New York City sewer system features: a sewer system of the City, b WWTP groups calibrated based on rain gauge stations, c NYC WWTPs’ location and design capacity (2 × DDWF), d locations of combined sewer system outfalls

Currently the WWTPs in NYC are designed based on simulated flowrates derived with rainfall from 4 rain gauge stations: Central Park (CP), LaGuardia Airport (LGA), John F. Kennedy International Airport (JFK), and Newark International Airport (EWR), as shown in Fig. 1b. Rainfall rates are assumed to be uniform across each group of the WWTP drainage areas independently. Therefore, the current NYC Department of Environmental Protection (DEP) calibration model does not consider the spatial variation and spatial dependence among and between the drainage areas, respectively. The hypothesis of this study is that considering the spatial variation and dependence of extreme rainfall between grid cells will produce more realistic design criteria, particularly given the spatial clustering of extreme rainfall shown in Hamidi et al. (2017).

2.1 Data

2.1.1 NYC sewer system data

There are 14 WWTPs located along the coast and waterways in NYC (Fig. 1c). Each of the WWTPs is sized based on a Design Dry Weather Flow (DDWF), with total plant treatment capacity equal to 2 × DDWF. This results in a citywide treatment capacity of approximately 8.4 × 106 m3/day (1845 MGD). Figure 1c indicates the division of sewersheds and the locations of the 14 WWTPs. Locations of the 451 combined sewer outfalls are shown in Fig. 1d. This data is available in the 2012 DEP report (NYCDEP 2012a) and in Open Source Atlas NYC (https://openseweratlas.tumblr.com). The land area normalized treatment capacity (i.e., \(2 \times DDWF/Area\)) for the full city is 27.2 mm/day. We made the simplifying assumption that all of the NYC sewer system is combined.

According to the available open sources data of the City (shared at http://www.arcgis.com), there are an average of 3110 catch basins per sewershed. The catch basins installed in NYC have a small storage volume of ~ 1.6 m3 each (NYCDEP 2009), with the potential of storing ~ 7×104 m3 (18.5 MG) runoff per event that should be considered in calculating the runoff. Table 1 lists all 14 WWTPs, their corresponding sewershed area, the number of catch basins, and the radar grid cell numbers which are described in the next section.

Table 1 The New York City WWTPs’ features

2.1.2 Radar rainfall data

The Next Generation Weather Radar system (NEXRAD) is comprised of 160 Weather Surveillance Radar-1988 Doppler (WSR-88D) sites throughout the United States and at select overseas locations (Heiss et al. 1990). While single radar records may suffer from blockage at certain locations (Vivekanandan et al. 1999; Lang et al. 2009) as well as range limitations, multi-sensor (gauge, radar, and satellite) products minimize these errors (Miller et al. 2010). Multi-sensor precipitation estimator algorithms provide a real-time suite of gridded products at different spatial scales (Kitzmiller et al. 2013). In this study, the NCEP (National Centers for Environmental Prediction) Stage IV radar product was employed to generate extreme rainfall fields over the NYC area. Stage IV radar data is mosaicked from the regional multi-sensor precipitation. This data is calibrated and adjusted for biases using automatic rain gauge measurements and quality control processes (Lin and Mitchell 2005). The data is reported at a spatial resolution of 4 km by 4 km and a temporal resolution of 1-h and is available in Earth Observing Laboratory (http://data.eol.ucar.edu) from 2002 to present. Figure 2a shows the 76 radar grid cells that cover the entire land area of NYC (also see Table 1). Daily radar data from 2002 to 2015 (14 years) was used in this study to identify and simulate extreme rainfall fields in NYC. The Stage IV radar rainfall has been used in urban pluvial flood analysis research before (e.g., Gourley et al. 2014).

Fig. 2
figure 2

a Stage IV Radar grid network covering NYC, b Curve Numbers aggregated for the sub-grid cells with respect to NRCS normal condition for NYC land cover

2.1.3 Rain gauge rainfall data

Rain gauge observation data was used in this study to provide a comparison for the radar data analysis. We used daily data from the four rain gauge stations cited in the introduction: CP, LGA, JFK, and EWR (see Fig. 1b). The data is archived at and available from the National Climatic Data Center (NCDC). We used the same time frame as for the radar data (i.e., 2002–2015).

2.1.4 Land cover and permeability data

High resolution spatially distributed land cover data for NYC was provided by the Department of Parks and Recreation in 2010 (https://data.cityofnewyork.us/) at a spatial resolution of ~ 0.9 m (3 ft) by 0.9 m. There are different estimates of surface infiltration, and thus various ways to compute runoff (e.g., the Horton 1933 equation). We used the NRCS approach for urban areas runoff estimation (Cronshey 1986), as proposed by the United States Department of Agriculture (USDA). The NRCS, formerly the Soil Conservation Service, developed runoff Curve Number (CN) from empirical analysis of small catchment runoff. CN represents the hydrologic soil cover complex of the watershed with respect to the soil type, land use, surface condition, and the Antecedent Moisture Condition (AMC). Three levels of AMC are considered: AMCI dry soil (but not to the wilting point), AMCII average case, and AMCIII saturated soil. The development of these procedures is outlined in NEH-4 (National Engineering Handbook, Section 4—Hydrology, Soil Conservation Service 1985), and briefly explained in Sect. 3 of this paper. Generally, for impervious and water surfaces, CN = 100 and 0 < CN < 100 for natural surfaces. We converted the land cover data to the CN values for the normal moisture condition (AMCII), which is also consistent with the runoff coefficients of NYC DEP (NYCDEP 2012b). The average CN value for each intersected area of radar grid cells with the sewershed borders is demonstrated in Fig. 2b.

3 Methodology

3.1 Generating extreme rainfall fields

The extreme rainfall fields for this study were generated as follows:

  1. 1.

    The 95th percentile (R95) rainfall for each grid cell was computed (only non-zero rainy days were considered). The average of R95 among the 76 grid cells for NYC is ~ 31 mm/day (1.2 inch/day) with standard deviation of ~ 1.6 mm/day (0.06 inch/day).

  2. 2.

    Daily extreme events were identified as any day when any of the 76 grid cells exceeds its R95. This resulted in a total of 266 unique extreme event days for NYC (i.e., an extreme event occurs on average every ~ 19 days). The average of maximum rainfall among the 266 events is equal to ~ 52 mm/day (2.1 inch/day) with standard deviation of ~ 24 mm/day (0.96 inch/day).

  3. 3.

    The 5-day antecedent rainfall at each grid cell was computed for each event. The average of maximum antecedent rainfall is ~ 5.5 mm/day (0.2 inch/day) with standard deviation of ~ 6 mm/day (0.24 inch/day). This data is used to compute the runoff as explained later in this section. 14% of extreme events occurred during boreal winter (Dec–Feb), 18% during spring (Mar–May), 41% during summer (Jun-Aug), and 27% during autumn (Sep–Nov).

The spatial dependence of grid cell rainfall is investigated in Fig. 3 by demonstrating the percentage of concurrent extreme rainfall events occurring at the grid cells. The dark blue shows that the corresponding grid cells experienced precipitation greater than R95 for many of the same extreme events. The concentrated areas of dark blue shading along the diagonal of Fig. 3 illustrates that grid cells corresponding to the same sewersheds are highly dependent (see Table 1 for grid cell index locations). There is also spatial dependence between sewersheds, as illustrated by the off-diagonal areas shaded with dark blue. For example, there is about a 75% chance that an extreme event was present at the Jamaica—JAM grids (G16–G31) given that an extreme event was present at the Red Hook—RH grids (G55–G61). The probability of ~ 75% is determined by averaging the grid cells’ corresponding values fall in the intersection of parallel lines in Fig. 3.

Fig. 3
figure 3

Percentage of concurrent extreme events occurring in NYC grid cells (e.g., the intersection of parallel lines indicates there is ~ 75% chance that an extreme event was present at the Jamaica-JAM grids, G16–G31, given that an extreme event was present at the Red Hook—RH grids, G55—G61)

The rain gauge data for the same period (2002–2015) was processed in the following:

  1. 1.

    The 95th percentile, \(R_{95}^{\prime}\), of precipitation was identified at each rain gauge station as the extreme rainfall threshold. The average of the \(R_{95}^{\prime}\), threshold among 4 stations is ~ 36 mm/day (1.5 inch/day).

  2. 2.

    The extreme events were identified for each station by applying the corresponding \(R_{95}^{\prime}\), threshold. This resulted in 89, 85, 84, and 86 extreme event days at CP, LGA, JFK, and EWR stations, respectively. The average of maximum rainfall among the four stations was ~ 180 mm/day (7.1 inch/day). The different number of events at each station is an artifact of the decision to estimate \(R_{95}^{\prime}\) of only rainy days (the total number of which is not constant across the four stations).

  3. 3.

    The 5-day antecedent rainfall was calculated. The average of antecedent rainfall among the four stations was ~ 40 mm/day (1.6 inch/day). 13% of events occurred during boreal winter, 23% during spring, 36% during summer, and 28% during autumn, similar to the seasonality of the radar-derived extreme rainfall days.

3.2 Spatial simulator algorithm

Multivariate simulations are often necessary for risk analysis (Rajagopalan et al. 1997; Vogl et al. 2012; Lall et al. 2016; Xu et al. 2017). In such a case, the dependencies between all variables (here, the individual rainfall grid cells), which define the spatial field (here, the extreme rainfall field) should be preserved by the simulation framework. This is because the use of a simple univariate approach could lead to considerable over or underestimation of the risk associated with a given event (Raynal-Villasenor and Salas 1987; Bruneau et al. 1994). Furthermore, the use of standard multivariate distributions with Gaussian structure is not reasonable if the marginal distributions are non-normal (e.g., heavy tailed asymmetric distributions: Titterington et al. 1985; West 1992; Meylan et al. 2012). Copulas have been shown to be a useful way to model the dependence structure independent of the marginal distributions (Sklar 1959), which more easily allows one to model dependent, non-Gaussian data, as is the case here.

Let F(X) be a joint distribution of multiple random variables \(x = \left( {x_{1} ,x_{2} , \ldots x_{m} } \right)\) and F(xi) is the marginal distribution function for variable xi, where i goes from 1 to m. A copula is introduced as a function that links the joint distribution F(X) to its univariate marginals F(xi). Sklar (1959) proved that for every multivariate distribution F(X) there exists a copula \(C:\left[ {0,1} \right]^{m} \to \left[ {0,1} \right]\) such that:

$$F\left( {x_{1} ,x_{2} , \ldots ,x_{m} } \right) = C\left( {F_{1} \left( {x_{1} } \right),F_{2} \left( {x_{2} } \right), \ldots ,F_{m} \left( {x_{m} } \right)} \right)$$
(1)

where F(xi)~ U[0,1]. When the marginal distributions are continuous, the multivariate probability density \(f\left( x \right)\) can be expressed in terms of the marginal densities of its comprising variables \(f_{i} \left( {x_{i} } \right)\) and a unique copula density \(c\):

$$f\left( x \right) = f_{1} \left( {x_{1} } \right)f_{2} \left( {x_{2} } \right) \ldots f_{m} \left( {x_{m} } \right)c\left( {u_{1} ,u_{2} , \ldots u_{m} } \right)$$
(2)

where ui are uniformly distributed random variables. Further information about copulas can be found e.g. in Frees and Valdez (1989), Nelsen (1999), Aas et al. (2009), and Vogl et al. (2012).

In this study, the nonparametric multivariate simulation approach based on the copula concept was applied on the spatially dependent extreme rainfall fields while the events were assumed to be temporally independent. In order to preserve the spatial dependency of the data, we employed the sampling strategy outlined in Lall et al. (2016). The steps are as follows:

  1. 1.

    Nonparametric log-spline density estimation was conducted for each grid cell (i = 1:76) over the extreme rainfall events (j = 1:266) as well as each grid cell’s antecedent rainfall over all events to estimate the marginal distributions.

  2. 2.

    From each fitted \(f_{i} \left( {x_{i} } \right)\), a random sample (\(x'_{ij}\)) were drawn. The sampling was done with replacement and repeated 100 times (no. of simulation) and we sorted each vector in a matrix \(\left( {x''} \right)\).

  3. 3.

    An empirical (pseudo) copula was considered. In this case, a copula function was applied on the empirical distribution funcions (Deheuvels 1979) of historical data set \(x_{j}\), j = 1:266. The empirical copula Cemp {zj, j = 1:266} was constructed where \(z_{j}\) is a rank matrix.

  4. 4.

    From rank matrix \(z_{j}\), 266 samples were drawn with replacement (bootstrap) and recorded as \(z_{j}^{\prime}\), j = 1:266. This step was also repeated 100 times (no. of simulations).

  5. 5.

    Finally, having the sorted matrix \(\left( {x''} \right)\) as well as the matrix of ranks from the empirical copula \((z^{'}_{ij} )\) for each simulation, a simulated matrix was defined using the following equation:

    $$w_{ij} = x'^{'}_{i} \left[ {z_{ij}^{\prime} } \right]$$
    (3)

    where \(w_{ij}\) is the jth event of the simulated matrix at grid cell i, and \(x'^{'}_{i} \left[ {z_{ij}^{\prime} } \right]\) selects the \(z_{ij}^{\prime}\) th element of \(x'^{'}_{i}\). Figure 4 shows a sample illustration of Lall et al.’s (2016) approach for j = 1:12 extreme events using i = 1:3 grid cells for only one simulation. Variable \(x\) represents the rainfall values (mm/day) for 12 events (no. of rows) among 3 grid cells (no. of columns), \(x'\) represents the sampled data from logspline distribution, and \(x''\) is the sorted matrix of \(x'\) (ascendingly) according to steps 1–2. In steps 3–4, \(z\),which is rank matrix of \(x\), and \(z^{\prime}\), which is the resampled matrix of \(z\), are developed. To develop the simulated matrix, \(w\), we used Eq. 3 introduced in step 5. For instance, the second row of \(w\) is constructed based on the 2nd, 1st, and 2nd largest values of \(x''\)(all shaded in orange in Fig. 4). Another example is given for the 9th row of \(w\) shaded in blue.”

    Fig. 4
    figure 4

    Sample illustration of the simulation algorithm based on 12 extreme events covering 3 grid cells (one simulation) adapted from Lall et al. (2016): Variable \(x\) represents the rainfall values (mm/day) across space and time, \(x'\) is the sampled data from logspline distribution, \(x''\) is the sorted matrix of \(x'\), \(z\) is the rank matrix of \(x\) and \(z^{\prime}\) is the resampled matrix of \(z\), and \(w\) is the simulated matrix derived from Eq. 3. (e.g., the second row of \(w\) is constructed based on the 2nd, 1st, and 2nd largest values of \(x''\) shaded in orange, and the 9th row of \(w\) is constructed based on the 11th, 10th, and 9th largest values of \(x''\) shaded in blue)

Employing this approach, the simulated fields of extreme rainfall data were obtained from \(w_{ij}\) and used to calculate the runoff and its uncertainty at the WWTPs. The general code and formulations corresponding to this approach is available from Lall et al. (2016).

3.3 Sewer system uncertainty analysis

Urban hydrologic models can be classified with respect to spatial and temporal resolution (Fletcher et al. 2013). In the spatial dimension, models can be either lumped or distributed. Lumped models use spatial averages of sub-catchments to represent the behavior of the full system (Willems 2001b; Löwe et al. 2014), while distributed models capture all the sub-catchment components using a node-link structure (Elliott and Trowsdale 2007). In the temporal dimension, models can be event based or continuous. Event based analyses are commonly used in the design of infrastructure and simulate the hydrologic response to specifically designed storms (e.g., Delleur 2003), while continuous analyses seek to model system behavior under continuous forcing that includes periods of wet and dry weather. In this study, we used a spatially lumped and temporally event-based analysis to limit the computational expense and satisfy the temporal independence assumption of the simulation method (Lall et al. 2016).

Runoff is determined primarily by the amount of rainfall, the infiltration characteristics of the land cover, and antecedent rainfall. As explained earlier, the NRCS approach was employed in this study to calculate the runoff as a function of precipitation, the underlying soil’s permeability, land use, and antecedent water content of the soil:

$$P_{e} = \frac{{\left( {P - I_{a} } \right)^{2} }}{{\left( {P - I_{a} } \right) + S}}$$
(4)

where \(P_{e}\) is the effective rainfall (mm), \(P\) is the depth of rainfall (mm), \(S\) is the potential maximum retention after runoff (mm), and \(I_{a}\) is the initial abstraction (mm). The initial abstraction includes retained surface water as well as evaporated and infiltrated water, and is generally correlated with land cover parameters. As in Eq. 4, runoff cannot begin until the initial abstraction has been met. Ia can be approximated by \(I_{a} = 0.2 \times S\) for urban watersheds as per the USDA (Cronshey 1986). S is related to the soil and land cover conditions of the sewershed through the CN:

$$S = 25.4 \times \left( {\frac{1000}{CN} - 10} \right)$$
(5)

The curve number methodology is an event-based approach, thus the effects of antecedent moisture conditions are taken into consideration. The CNs suggested for the normal Antecedent Moisture Condition (AMCII) by NRCS were mapped in Fig. 2b. Depending on the seasonality (dormant or growing season) and total 5-day antecedent rainfall, equivalent curve numbers are suggested by NRCS. In the current case study, we assumed only dormant season in NYC and calculated the equivalent curve number according to the antecedent rainfall determined for gauge stations and radar grids. CN in dry conditions (AMCI, 5-day antecedent rainfall < 12.7 mm) and wet conditions (AMCIII, 5-day antecedent rainfall > 27.9 mm) can be computed by:

$$CN_{I} = \frac{{4.2 \times CN_{II} }}{{10 - 0.058 \times CN_{II} }}$$
(6)
$$CN_{III} = \frac{{23 \times CN_{II} }}{{10 + 0.13 \times CN_{II} }} .$$
(7)

where I, II, and III represent dry, normal, d wet conditions, respectively. 5-day antecedent rainfall less than 27.9 mm (1.1 inch) and greater than 12.7 mm (0.5 inch) was considered normal according to NRCS.

The framework of sewer system uncertainty analysis is summarized in Fig. 5: After preparing data, the extreme rainfall field and corresponding antecedent events derived from the radar data were imported into the simulator. The NRCS approach was then applied on the simulated extreme precipitation events. CN and S were estimated from Eqs. 57 and applied to Eq. 4 to determine the effective rainfall at each sub-grid cell. We subtracted the catch basin storage volumes per event (QCB) to compute the excess runoff from each event in each sewershed (\(Q_{s}^{t}\)):

$$Q_{s}^{t} = \mathop \sum \limits_{n = 1}^{N} \left( {P_{e,n}^{t} A_{n} - Q_{CB,n}^{t} } \right)$$
(8)

where \(P_{e}^{t}\) is the effective rainfall for event t, A is the sub-grid cell area, \(Q_{CB}^{t}\) is the catch basin storage volume at sub-grid CB during event t, n is the no. of sub-grid cells in each sewershed (N changes for each sewershed, see Table 1), and s is the sewershed index (1:14). The simulated results were required to be verified.

Fig. 5
figure 5

Proposed framework for sewer system uncertainty analysis

The excess runoff was also calculated using the rain gauge data according to the independent events developed in Sect. 3.1. The goal is to investigate the significance of considering the spatial dependence of the extreme rainfall fields between the grids by comparing \(Q_{s}^{t}\) derived from the simulate extreme rainfall field with \(Q_{s}^{t}\) derived from the spatially independent events (from rain gauges). In calculating runoff corresponding to rain gauge extreme rainfall we picked the same criteria used by NYC DEP (see Fig. 1b). Thus \(P_{e,n}^{t}\) in Eq. 8 is equivalent for n = 1:N according to the calibrated rain gauge station. Finally, we estimated the probability of CSO occurrence during extreme rainfall events, and the runoff change with respect to the land cover distribution and density. This analysis targets the stormwater management plans of the City.

4 Results and discussion

4.1 Rainfall simulation verification

The distributions of simulated radar rainfall data were compared with the observed radar events in order to verify the simulation model. First, the median, standard deviation and 90th percentile of the 266 extreme rainfall events’ rainfall data at each of 76 grids were compared with the corresponding simulated values. Figure 6 shows the simulated versus observation based median, standard deviation and 90th percentile at three sample grids corresponding to the CP, LGA, and JFK station locations (see Fig. 1 for the location). Aside from a small negative bias in the standard deviation of the simulated marginal distributions which is less than 9% (Fig. 6b), the simulated and observed marginal distributions are quite similar.

Fig. 6
figure 6

Comparison of observed (red dots) and simulate (boxplots) extreme rainfall events at the three grid cells at the locations of CP, LGA, JFK: a median, b standard deviation, c 90th quantile

We also compared the cross-station dependence of extreme rainfall field grids between the simulations and observations by comparing their rank correlation (RC), mutual information (MI), and tail dependence coefficient (TDC) across the grids (Fig. 7). Spearman’s rank-order correlation measures the strength and direction of a monotonic relationship between each pair of grid cells’ extreme rainfall data and can take a range of values from +1 to − 1. A value of 0 indicates that there is no association between the two variables and values greater than 0 indicate a positive association. Figure 7a shows high correlation between the grids at the same WTTP, shaded along the diagonal. Figure 7b also shows that the correlation values of the simulated and observed data are quite similar (between 0.4 and 1) and that the bias is generally between − 1 and 1%.

Fig. 7
figure 7

Cross-station dependence of the observational data and the bias in cross-station dependence of the simulation relative to the observation [\((Sim - Obs)/Obs \times 100\)]: a, b rank correlation, c, d mutual information, e, f tail dependence coefficient

Mutual Information (MI), introduced by Shannon (1948), is a measure of how similar the cross-station dependence is to the products of factored marginal distributions, i.e. it captures nonlinear dependence (Cover and Thomas 1991). The results are scaled according to the transformation proposed by Joe (1989), which ranges from 0, for complete independence, to 1, for full dependence (large reduction in uncertainty). There is high MI between the grids at the same WTTP (Fig. 7c) and the bias in MI is generally modest (between − 5 and 5%), with the exception of a few high bias grids (up to 30%) (blue stripes in Fig. 7d). Lastly, the Tail Dependence Coefficient (TDC) measures the probability of occurrence of greater than the 90th percentile at one grid given that another grid is also greater than the 90th percentile (Ferreira 2013). Figure 7e shows high TDC between the grids within the areas serviced by the same WTTP. Figure 7f implies that the TDC values of the simulated and observed data are quite similar and that the bias is generally between − 5 and 5%.

Overall, the rank correlation is well simulated by the model (errors are less than 4% for all pairs of grids). This indicates that the model captures the monotonic relationships between sites. The bias in the mutual information between sites is greater (upwards over 5% for many pairs of grids). In particular, the model has a tendency to simulate a weaker nonlinear relationship between pairs of grids when compared to the observations (panels c and d of Fig. 7). There is also relatively large bias in the simulation of tail dependence between grids in some cases, although this bias is not systematic across all grid pairs (bottom panels of Fig. 7), i.e. negative tail dependence bias appears to be approximately as likely as positive tail dependence bias.

4.2 Simulated runoff comparison

Figure 8a, b compare the distribution and mean of the simulated runoff with the rain gauge runoff at each WWTP. The results indicate that the values of runoff corresponding to the rain gauge data are significantly higher than the radar simulated runoff at the 95% significance level. Since we considered the extreme rainfall field (76 grids) in the simulation and uniform extreme rainfall in rain gauge runoff calculation, this overestimation of rain gauge runoff was expected. Such overestimation of hazard because of not considering spatial dependence has been reported in other studies (McRobie et al. 2013; Simões et al. 2015). This confirms the significance of considering spatial variation and dependence of extreme rainfall, hypothesized in this paper.

Fig. 8
figure 8

Runoff values of simulated extreme rainfall fields (radar data) and runoff values of the rain gauge extreme rainfall events at each of 14 WTTPs: a means, b distributions

To check whether there is a systematic bias in the extreme radar rainfall data, we evaluated the radar rainfall values at the four grid cells corresponding to the rain gauge locations on days when extreme events occurred, as determined by the rain gauge records. We computed the relative bias in radar data at each rain gauge station during every extreme event. Figure 9 shows that there is only modest bias in the radar rainfall values during rain gauges based extreme rainfall events. The median bias among the events is 0, 5, 0, and 12% at CP, LGA, JFK, and EWR, respectively. This bias is acceptable given that rain gauge rainfall estimates are derived from time integrated point-based measurements while radar rainfall estimates are derived from spatially integrated and temporally discrete sampled measurements. This error has been noted by other researchers (e.g. Medlin et al. 2007; Villarini et al. 2010; Park et al. 2016).

Fig. 9
figure 9

Bias check of radar rainfall data with respect to the rain gauge extreme events at: a CP, b LGA, c JFK, d EWR

4.3 Sewer system uncertainty analysis results

4.3.1 WWTP uncertainty (end-of pipe control)

We estimated the excess runoff at each WWTP and compared the extreme events’ runoff with 2 × DDWF to determine the probability of exceeding the capacity at each WWTP. We accounted for baseflow wastewater [called Dry Weather Flow or DWF (NYCDEP 2012a)] by assuming that the ratio of DWF and DDWF (DWF/DDWF) ranges between 35 and 75% during all events. Thus, if \(Q_{s}^{t}\) is the precipitation generated flow, and we assume that DWF/DDWF is 50%, then the probability of exceeding the flow capacity at each WWTP is \(P(Q_{s}^{t} > (1.5 \times DDWF))\). The boxplots in Fig. 10 represent the probability of the simulated runoff exceeding the plant capacities under various values of DWF/DDWF. For instance, P = 0.02 means that 2% of extreme events exceed the design capacity of the WWTP. Average baseflow (DWF/DDWF) in NYC WWTPs during 2011 is presented in Table 2 (from NYCDEP 2012a). According to this data, the medians of the probability of exceeding the capacity of WWTP were mapped in Fig. 11. Results indicated higher probability of exceeding the capacity (and thus a higher likelihood of CSO) at the JAM—Jamaica, OH—Owls Head, and BB—Bowery Bay WWTPs. Recent DEP construction projects have included upgrades to the wastewater treatment facilities and storm sewer system by expanding the network and constructing large CSO retention tanks to further mitigate the chronic source of pollution. Some of the most recent CSO control systems in the City have been implemented at the BB, JAM, TI, and CI plant outfalls (http://www.nyc.gov/dep). The results of Figs. 10 and 11 can be a useful guide for end-of-pipe stormwater storage and treatment systems of the City.

Fig. 10
figure 10

Probability of exceeding (2 × DDWF  DWF) at each NYC WWTP under simulated extreme rainfall events for different ratios of DWF/DDWF (baseflow) at the plants

Table 2 Average baseflow in NYC WWTPs during 2011
Fig. 11
figure 11

Risk map of the median of probability of exceeding the NYC WWTPs design capacity (2 × DDWF  DWF) with respect to the baseflow (DWF/DDWF) of 2011

4.3.2 Excess runoff prediction (source control)

We also estimated the change in runoff with respect to changes in stormwater capture infrastructure of the City. First, we determined Qc as the contribution (%) of each sub-grid in the corresponding sewershed’s total runoff. Then, a nonparametric joint distribution was estimated for the simulated runoff contribution and the corresponding curve numbers weighted by the area of each sub-grid cell. The joint distribution of Qc and CN × Area of the sub-grids is presented in Fig. 12a. The x-axis is Curve Number weighted by area (CN × Area), the y-axis is runoff contribution Qc (%), and the contours in 2D plot show the probability density function for the joint distribution. This approach may be useful in planning for the urban infrastructure. In NYC, the recent agreement of the City with the New York State Department of Environmental Conservation aims to reduce the CSO through a hybrid Green Infrastructure (GI) and grey infrastructure approach to improve the water quality in NYC’s waterways (http://www.nyc.gov/dep). GI is a source control approach to manage stormwater by detaining or retaining the excess runoff through capture and controlled release by infiltrating the runoff into the ground and increase the vegetative uptake and evapotranspiration. GI, therefore, reduces the need for end-of-pipe stormwater storage and treatment systems, while poviding additional benefits such as contracting urban heat island effects (Wang et al. 2013). From results in Fig. 12a, we were able to estimate the most effective GI placement from a CSO mitigation perspective. Figure 12b presents the median simulated runoff contribution (Qc) of each sub-grid cell at the corresponding sewershed such that the summation of the Qc at each sewershed equals 100. Figure 12 illustrates where the most effective sub-grids are for the introduction of new GI if the goal is a 1% reduction of runoff within a certain sewershed by decreasing the CN of the corresponding sub-grid by 1%. The values of initial runoff contribution and initial land cover (represented by CN) were taken from Figs. 12b and 2b, respectively. With a comparison of the surface of Fig. 12a for different neighborhoods (sub-grid cells), we can find the areas in NYC with higher probabilities of ∆Qc with ∆CN × Area. Figure 12c indicates those neighborhoods (top 20 sub-grid cells), mostly located at the JAM—Jamaica, for the assumed criteria. Also, for a specific neighborhood, if is planned to reduce the Qc for a certain amount, with optimization between the new CN and the cost of installing GI, the best plan can be chosen.

Fig. 12
figure 12

GI installation as a source control approach: a 2D plot of probability density function of runoff contributions (Qc) and Curve Number times the Area (CN × Area), b contributed discharge to each sewershed, mapped for the median of the simulated extreme rainfall events in NYC, c the most effective neighborhoods (top 20 sub-grid cells) of installing GI in NYC to decrease 1% of Qc by decreasing 1% of CN at the corresponding sub-grid cell

5 Conclusions

A novel framework that utilizes the simulation approach of Lall et al. (2016) was developed in this study to estimate the urban runoff during extreme precipitation events in NYC and compare this runoff to wastewater treatment plant flowrate capacities. The extreme precipitation simulation framework allowed us to simulate the uncertainty of extreme rainfall without neglecting the spatial structure of extreme rainfall data. The main conclusions of this study for NYC are summarized as follows:

  1. 1.

    According to the current analysis, JAM—Jamaica, OH—Owls Head, and BB—Bowery Bay WWTPs are more prone to CSO under extreme rainfall events (see Figs. 10 and 11).

  2. 2.

    In NYC, we were able to determine the neighborhoods with the highest effects of installing GI in controlling the excess runoff (see Fig. 12). The results were presented for the top 20 sub-grid cells for an example scenario of reducing 1% of runoff within a certain sewershed by decreasing the runoff coefficient of the corresponding sub-grid by 1%. However, these locations may change according to the different infrastructure scenarios.

In summary, the main contributions of this study are listed as follows:

  1. 1.

    The results of Sect. 4.2 confirmed the significance of preserving the spatial dependence of the extreme rainfall field between the grid cells in hydrologic modeling. Specifically, assuming uniform extreme rainfall (based on a rain gauge within a sewershed) can lead to overestimation of runoff.

  2. 2.

    The uncertainty analysis of WWTPs in Sect. 4.3 provided a guideline approach for end-of-pipe stormwater management. This paper presented a straightforward strategy for city planners to investigate the effect of infrastructure change on stormwater runoff as a source control system.