Spatial coherence patterns of extreme winter precipitation in the U.S.

Extreme precipitation events have a significant impact on life and property. The U.S. experiences huge economic losses due to severe floods caused by extreme precipitation. With the complex terrain of the region, it becomes increasingly important to understand the spatial variability of extreme precipitation to conduct a proper risk assessment of natural hazards such as floods. In this work, we use a complex network-based approach to identify distinct regions exhibiting spatially coherent precipitation patterns due to various underlying climate mechanisms. To quantify interactions between event series of different locations, we use a nonlinear similarity measure, called the edit-distance method, which considers not only the occurrence of the extreme events but also their intensity, while measuring similarity between two event series. Using network measures, namely, degree and betweenness centrality, we are able to identify the specific regions affected by the landfall of atmospheric rivers in addition to those where the extreme precipitation due to storm track activity is modulated by different mountain ranges such as the Rockies and the Appalachians. Our approach provides a comprehensive picture of the spatial patterns of extreme winter precipitation in the U.S. due to various climate processes despite its vast, complex topography.


Introduction
Extreme precipitation poses a serious threat to lives and livelihood of people all around the world. With the intensification of extreme precipitation and increase in flood events over most climate regions (Tabari 2020;Easterling et al. 2017;Janssen et al. 2014;Vu and Mishra 2019;Kunkel et al. 2012) due to climate change, understanding the spatial variability of extreme precipitation is crucial Abhirup Banerjee abhirup.banerjee@pik-potsdam.de 1 secular variations (Kunkel et al. 2012), their spatiotemporal variability (Mullen 2008), and their relation to large-scale meteorological patterns . Here, we focus on the spatial connectivity of extreme precipitation events, which is relevant for understanding river flood generation and anticipating the spatial extent of simultaneous flooding (Brunner et al. 2020;Kemter et al. 2020). Understanding the spatial dependence of extreme precipitation and its underlying mechanism is vital to assess risk from natural hazards. Simultaneous extreme precipitation across large scales can lead to synchronous flooding in multiple states, which has a greater societal and financial impact than independent, localized flood events due to regional interdependencies in risk management, infrastructure, and insurance (Jongman et al. 2014).
We use a complex network-based approach to study the spatial patterns of extreme winter precipitation in the U.S. Climate network analysis can help to identify the regions which are most likely to experience concurrent precipitation extremes along with the climatic conditions that are responsible for their generation. The climate network belongs to the category of functional network, in which pairwise statistical dependencies between the time series associated with the nodes of the network are computed to estimate the functional connectivity between them. Then, the topological structure of the network is analyzed using different network measures (Donges et al. 2009b;Fan et al. 2021;Tsonis and Roebber 2004).
The network representation of the spatiotemporal climate data allows us to study pairwise interactions between climate variables of different locations. The analysis of the climate network enables us to understand the spatial pattern of climate variability. Standard similarity measures used to estimate the strength of the climate interaction include Pearson's or Spearman's correlation coefficients. However, these are not suitable for evaluating the relationship within extreme precipitation data, which are event-like time series. Therefore, nonlinear synchronization measures specifically designed to compute the similarity between event series are required to construct extreme precipitation networks. Recent works have extensively used event synchronization (ES) (Quian Quiroga et al. 2002), in particular, to construct climate networks for event-like data such as extreme precipitation (Malik et al. 2011;Stolbova et al. 2014;Ozturk et al. 2019) and heat wave pattern (Mondal and Mishra 2021). Boers et al. (2013), Boers et al. (2014a), andBoers et al. (2014b) used complex networks constructed based on ES to study the South American Monsoon and reveal the global extreme precipitation pattern (Boers et al. 2019). Konapala and Mishra (2017) used the same climate network framework to study hydroclimatic extreme events. Agarwal et al. (2017) introduced multi-scale event synchronization by combining wavelet transform and ES. However, ES only considers the time of occurrence of events to identify the events' coincidence and obtain the degree of similarity, but not the difference in strength or amplitude of the events. While some previous works (Ciemer et al. 2018) have proposed some modified correlation measures to investigate spatial co-variability patterns of general precipitation (i.e., considering the amplitude variability), they are linear and, thus, limited for studying extreme precipitation behavior.
In our study, we use a special distance metric, particularly designed to study the similarity between spike trains, called edit distance (ED), first proposed by Victor and Purpura (1997) and later extended by Hirata and Aihara (2009). This metric has been used in combination with recurrence plots (Eckmann 1987) to analyze the recurrence property of marked point process data (Suzuki et al. 2010), paleoclimate data (Ozken et al. 2015;Ozken et al. 2018), and extreme event-like hydrological data (Banerjee et al. 2021). Recently, Agarwal et al. (2022) integrated ED with climate networks to study the extreme rainfall pattern in the Ganga River basin and highlighted its advantage over ES. In the case of ED, each event series is considered as a marked point process and the similarity between two such event series is measured by optimizing the cost of transformation associated with transforming one event series to another one through elementary operations, such as shifting in time or amplitude, addition, or deletion of events.
Spatial patterns of different network measures, namely the degree and the betweenness centrality, are used to study the spatial connectivity pattern of extreme winter (DJF) precipitation events. While the degree centrality is based on local topological information, the path-based betweenness centrality includes the global topological information (Donges et al. 2009b). Through our approach, we are not only able to identify regions with distinct extreme precipitation patterns, but also delineate the regions affected by atmospheric rivers and tornadoes. In Section 2, we describe in detail the data and the methodology. In Section 3, we discuss the results based on our network analysis and draw an interpretation from a climatological point of view.

Data source and data pre-processing
In this study, we use daily averaged precipitation, geopotential height and wind at different pressure levels, and vertically integrated water vapor (IVT) flux data derived from ERA5 reanalysis (Hersbach et al. 2020) for the period 1980-2020. The spatial resolution used is 0.5 • × 0.5 • . It is worth mentioning here that although the reanalysis precipitation data do show biases compared to the observations, observational datasets typically either have a limited spatial coverage (GPCC, TRMM, etc.), lower resolution (GPCP), or a limited temporal coverage (TRMM). The ERA5 shows, in most cases, smaller biases than other reanalysis datasets (JRA-55, MERAA-2) (Hassler and Lauer 2021). Nevertheless, we verify the robustness of our results by comparing them with those obtained using JRA-55 (Japan Meteorological Agency 2013) (see figures in the Supporting information).
To construct the extreme precipitation event series from the daily averaged precipitation time series data at each grid point, we consider only those days as events for which precipitation is among the highest 5% of all values, including dry days without precipitation, in a particular season (here, DJF) at that location, resulting in 4 to 5 events for each season (Malik et al. 2011;Boers et al. 2013;Stolbova et al. 2014).

Network construction
A network or graph comprises two main components: a set of nodes V and a collection of edges E. Mathematically, a network is expressed as G = {V , E} (Donges et al. 2009b;Sivakumar and Woldemeskel 2014). In the case of a climate network, each geographical grid point of the climate dataset is considered a node, and an edge is placed when there is a statistically significant association or functional dependency between two nodes. To construct the climate network for extreme precipitation, first, we transform the precipitation time series data at each grid point into an extreme precipitation event series as described earlier.  Fig. 1 Schematic of the transformation of segment S a to S b through four steps numbered as steps S 1 , . . . , S 3 . The path shown is a minimalcost path and all steps are elementary steps, i.e., shifting an event, amplitude modulation, deleting/inserting Then, we construct the network for extreme precipitation events to study its pattern of spatial variability as follows.
In this study, we use the edit distance (ED) method, which takes into account both the sequence and amplitude of events. In general, ED is a distance metric to quantify the similarity/ dissimilarity between two spike trains (Victor and Purpura 1997;Banerjee et al. 2021). Additionally, ED considers each event series as a marked point process (Suzuki et al. 2010;Ozken et al. 2015;Ozken et al. 2018). The idea is to transform an event series into another series by performing some elementary operations: shifting in time, amplitude modulation, and deletion/insertion of events ( Fig. 1). A specific cost is assigned to each such operation. The total transformation cost to convert one event series to the other is computed by tracing the minimal-cost path.
The mathematical formulation of the distance metric is described as follows. Consider two given segments S a and S b , the minimum cost of transformation is defined as (1) The time and amplitude of events are denoted as t a (α), t b (β), and L a (α), L b (β), 0 and 1 are the coefficient of cost of shifting in time and change in amplitude, while the cost allotted for each insertion and deletion operation is s . The first term of Eq. (1) sums the cost of shifting in time and amplitude modulation between the αth event in S a and βth event in S b . C is the set containing all the pairs associated in this operation. The second term in Eq. (1) denotes the deletion/insertion operation. I , J are the sets of indices of events in S a and S b respectively. | I |, | J |, and | C | are the cardinalities of the sets I , J , and C respectively. All three cost parameters are computed as suggested by Agarwal et al. (2022). Naturally, the minimum cost of transformation implies the highest similarity and vice versa. We then calculate the transformation cost for every pair of event series S i and S j , corresponding to nodes i and j , of the gridded extreme event dataset using the above method, which gives us the similarity matrix Q ij (here, cost matrix). Thereafter, we obtain the adjacency matrix A ij by thresholding the similarity matrix Q ij with a suitable threshold, which gives the edges of our network. Mathematically, A ij = ( − Q ij )−δ ij , where is the Heaviside function, i.e., we assign 1 when the cost is below a certain threshold, otherwise 0.
is the threshold, and δ ij is the Kronecker delta to remove self loops. In the case of ED, lower transformation cost between two event series implies higher similarity. For all pairs of grid cells whose value of the transformation cost is below the threshold will be connected by an edge. In this study, to find the significant edges, we fix the edge density of the network at ρ = 2E N(N−1) = 5% and choose the corresponding threshold (ρ) (Malik et al. 2011;Stolbova et al. 2014;Wiedermann et al. 2017).

Network measures
Various network measures are used to quantify the network topology which provide novel insights into the underlying dynamics of the system over different spatial scales (Donges et al. 2009a). We use two network measures to quantify and characterize the spatial pattern of extreme precipitation. One of the basic local network measures is the degree which measures the centrality of a node based on how wellconnected it is. The degree k i of a node i is defined as where N is the total number of grid points (nodes). It quantifies the number of direct connections node i has with other nodes in the network (Fig. 2a).
In climate networks, nodes connected by a link indicate the spatial distribution of areas with similar climate variability. The nodes with higher degree values k i are crucial regions that regulate the connectivity of the network and are typically related to large-scale atmospheric circulation (Malik et al. 2011;Boers et al. 2013;Boers et al. 2014b). Degree has been used to identify the highly connected geographical sites (super-nodes) and their association with atmospheric teleconnection pattern (Tsonis et al. 2008;Radebach et al. 2013;Agarwal et al. 2019).
The second network measure we use here is the betweenness centrality, which provides information about the global topology of the network on the basis of shortest paths between pairs of nodes (Donges et al. 2009a).
Betweenness centrality BC i measures how much a node i falls "in between" two nodes in the network, i.e., acts as a bridge connecting two other nodes (Newman 2010;Freeman 1978). A node may not be well-connected (i.e., has low degree) but can be crucial to connect different parts of the network (Golbeck 2015) (Fig. 2b). Betweenness is quantified by measuring the percentage of the shortest paths that must go through this specific node i and is defined as where σ jk is the total number of shortest path between node j and k and σ jk (i) is the number of shortest paths that go via node i. In the case of a social network, BC indicates the importance of a node in controlling the flow of information in the network. However, for functional networks, such as climate networks, it represents boundaries between highly connected regions (Molkenthin et al. 2014;Tupikina et al. 2016). BC has been used to uncover energy flow patterns in the atmosphere (Donges et al. 2009b) and has also been successfully applied to study the extreme precipitation patterns of different monsoon systems (Boers et al. 2013;Stolbova et al. 2014).
Correction for spatial embedding: When we choose a particular study area, we impose an artificial boundary in space. These boundaries influence the climate network (Rheinwalt et al. 2012;Boers et al. 2013) by cutting links that actually connect nodes with outer regions, hence affecting the network measures. Here we adopt the boundary correction procedure suggested by Rheinwalt et al. (2012) as follows: We first generate 500 spatially embedded random networks (Barnett et al. 2007) (SERN) which preserve both the node position and the distribution of the spatial link lengths of the original network. After that, we compute the network measures for all SERN surrogates. The boundary-corrected network measure is obtained by dividing the value of the measure of the original network by that of the average of the SERN surrogates. Mean winter precipitation anomaly as the fraction of mean annual precipitation falling in winter (same period) for ERA5 reanalysis data.
Anomalies are highly positive along the West Coast and slightly positive along the southern flank of the Appalachians. Highly negative anomalies exist in the central north

Calculation and network interpretation
In this section, we analyze the winter extreme precipitation pattern of the U.S. using the above introduced complex network measures. Our climate network, constructed using the ED metric (mentioned in Section 2.2), considers both the sequence and the amplitude of events when quantifying similarity. High degree nodes (Eq. 2) represent regions of high connectivity of extreme precipitation events that are connected to many grid cells which exhibit similar variability of extreme precipitation occurrence and intensity. We find in our extreme precipitation network, a relatively low degree in the northwestern part of the U.S. (Fig. 4a), suggesting less similarity of extreme precipitation behavior with any other regions. On the other hand, a high degree is observed in the eastern Pacific Ocean and southwestern part of the U.S. To understand the connectivity pattern for these regions, we choose really small boxes A (low degree) and B (high degree), in the northwestern part of the U.S. and in the eastern Pacific Ocean respectively, and determine the number of links connecting these boxes with other nodes in the network ( Fig. 5a and b). We find that connections with the region A are confined to a very small region centered more towards the coast, indicating a quite narrow corridor of moisture transport as typical for atmospheric rivers (Dettinger 2013; Xiong and Ren 2021; Hu et al. 2017;Gonzales et al. 2019). On the other hand, the connectivity of region B spans over a larger area in the Pacific Ocean and extending up to some parts of the southwestern coast. Such extended spatial connectivity of extreme precipitation indicates the presence of a larger atmospheric pattern that impacts the region, such as tropical cyclones which usually tend to cause enhanced rainfall in this region (Woodruff et al. 2013). We also observe high degree values in the Great Plains and northeastern parts of the U.S. We choose another small box C in this region which lies roughly in the Mississippi river watershed (Fig. 5c). The connectivity pattern of this region indicates similar variability of extreme precipitation along the southwest-northeast direction. Furthermore, highelevation regions such as the Cascades, some parts of the Rockies and the Appalachians show relatively lower degree than regions of low elevation which was also observed by Agarwal et al. (2022) in case of extreme precipitation networks constructed using edit-distance for the Ganges river basin in India. Similar observations are made in the results obtained using JRA-55 dataset (see Fig. S2a). Next, we study the spatial patterns of BC (Fig. 4b), which reveal some striking structures associated with the transition zones between different atmospheric flows (Molkenthin et al. 2014;Tupikina et al. 2016) during winter in the U.S.
Along the northwestern coast of the U.S., we find high betweenness but low degree. This implies that although these are relatively small regions of similar precipitation dynamics, they are transition zones of different atmospheric flow directions (Molkenthin et al. 2014), possibly because of spatial confinement and orographic lift due to the presence of topographical features such as mountains. The high BC values are seen to continue downwards along the entire western coast, lining the land-sea boundary. The results obtained from ERA5, however, deviate from those obtained from JRA-55, where the BC values decrease substantially beyond 30 • N southwards.
We observe high BC values in the central U.S., i.e., from Texas towards the Midwest area, and in the northeastern region, which are also regions of high degree. This implies that while the lower elevation regions, east of the Rocky mountains (Great Plains) and the Appalachians (Coastal Plains), are large regions of spatially coherent extreme precipitation dynamics, big rivers, and mountain features cause diversification of atmospheric flow leading to different and strongly fragmented precipitation patterns. These observations are mostly similar with those seen in the network constructed using the JRA-55 dataset (Fig. S2b) except for the small disparity in BC values seen along the southwest coast. This may be due to the relatively high bias in JRA-55 precipitation data in the Pacific Ocean close to the tropics (Hassler and Lauer 2021).

Climatological interpretation
The low spatial connectivity of precipitation in the northwestern part of the U.S. (Fig. 4a) is caused due to the effects of the Cascade and Rocky Mountains. Precipitation gets "trapped" west of these ranges and, thus, is not connected to the rest of the country, lowering the overall degree. At higher elevations, extreme precipitation requires different conditions than the coast, so the northwest coast and the mountain ranges are also not connected. However, as rainstorms can travel more freely through the plains on the eastern side of the mountains, it leads to a higher regional similarity. The presence of the western Cascades results into an orographic lift, effectively transforming the water vapor to extreme precipitation resulting in high BC values a little inland from the northwest coast (Fig. 4b).
The southwestern part of the U.S. along with adjacent regions of the eastern Pacific Ocean exhibits high connectivity due to a high fraction of winter precipitation despite low mean winter precipitation (Fig. 3). This can be explained by the fact that this part of the eastern Pacific is a separate, relatively small, and well-organized precipitation system (Zhang and Wang 2021) as also seen from Fig. 5b. Elevation and slopes are much lower here than further north, so rainstorms can penetrate further into the land and cause near-simultaneous precipitation along the land terrain.
The southwestern coast has high BC values similar to the northwestern coast, indicating that they may be related to the transition in opposing atmospheric flow direction. The western coast of the U.S. experiences heavy precipitation, and hence extreme streamflows, due to the atmospheric rivers (ARs), which contribute 30 to 45% of total winter precipitation (Dettinger 2013; Xiong and Ren 2021;Hu et al. 2017). ARs are relatively narrow filament-shaped conduits of moisture in the atmosphere transported from the lower latitudes to the mid and high latitudes (Gimeno et al. 2016;Guan and Waliser 2015;Ralph et al. 2019). The activity of ARs starts during autumn and tends to shift southward along the Pacific coast later during the winter (Gonzales et al. 2019). However, these ARs may be associated with different regimes of large-scale Rossby wave breaking (RWB) -anticyclonic wave breaking (AWB) in the northwest and cyclonic wave breaking (CWB) in the southwest (Hu et al. 2017) (Fig. 6a  and b). The penetration of high BC values further inland (Fig. 4b), in the northwest U.S., close to the western slope of the Cascades may be related to the AWB-ARs, which arrive more orthogonally to the western Cascades due to their westerly impinging angle transforming moisture to precipitation due to orographic lift. On the other hand, the CWB-ARs have impinging angles, which are more southwesterly, and therefore arrive more orthogonally to the east-west oriented Olympics in the northwest U.S. and the northwest-southeast oriented Sierra Nevada along the southwest coast. Consequently, they cause intense precipitation along the western coast. The transformation of water vapor to extreme precipitation through the orographic lift , albeit due to different regimes of RWB, explains the high BC along the western coast. The relatively high degree in the southwestern region may be related to the high density of the shorter track ARs close to central and southern California. The seasonal progression of the mean latitude position of the AR tracks southwards could also possibly explain the high BC values in this region (Gonzales et al. 2019).
The southwest-northeast (SW-NE) inclination in connectivity of the high degree regions in the northeast U.S. and the Great Plains (Fig. 5c) is in agreement with (Najibi et al. 2020), who found high similarity in anomalous extreme precipitation in winter in these regions.
The eastern side of the Rockies also has high BC values, which may be attributed to the pressure gradient seen in the atmosphere (Fig. 6a, b, and c) (Molkenthin et al. 2014). The area roughly coincides with the loosely defined region called the Tornado Alley, where tornadoes occur very frequently (Concannon and Brooks 2000;Bluestein 2006). We also see the propagation of wind in the southwesterly direction in all atmospheric levels (Fig. 6a, b, and c). The IVT seasonal composite anomalies (Fig. 6d) also show an anomalously high moisture transport in this direction. This flow pattern is modulated by the presence of the Rocky mountains (Lukens et al. 2018) which suppress the storm-track activity by deflecting the westerly flow over land (Chang 2009). This leads to a SW-NE tilt in the upper tropospheric jet (Fig. 6a) which subsequently causes a downstream flow and hence high betweenness along those nodes. High BC values along the northeast coast of the U.S. may also be associated with high baroclinic instability formed due to the large land-sea temperature gradient in winter over northeastern U.S. Brayshaw et al. (2009) which leads to an intensification of extratropical storms on the leeward side of the Appalachian mountains (Colucci 1976;Lukens et al. 2018). Extreme precipitation in this region is mainly related to an anomalously high upward lift of air along the coast due to high vorticity advection, frequent warm conveyor belts, and diabatic heating . The wind flow (Fig. 6a, b, and c) and high anomalous IVT (Fig. 6d), along the northeast coast, lead to synchronous extreme precipitation in the region and hence high degree.

Conclusions
The climate network approach has been proven to be a robust and promising framework for studying various climate extremes such as extreme monsoon precipitation (Malik et al. 2011;Boers et al. 2013;Boers et al. 2014b), the influence of El Niño (Boers et al. 2014a), and cyclone tracks (Gupta et al. 2021). In this work, we studied the spatial variability of extreme precipitation during winter in the U.S., which has a very complex topography. For this, we employ the edit distance metric to measure pairwise similarity between extreme precipitation time series of different locations. Most of the earlier developed methods (Malik et al. 2011;Stolbova et al. 2014;Boers et al. 2013;Boers et al. 2014a;Wolf et al. 2020) consider only the timing of events when studying the similarities in event-like data. However, the edit distance emerges as a powerful alternative measure because it considers the amplitude or strength of extreme events along with their time of occurrence when calculating the similarity.
Extension of the coherent regions depends on the orography, seasonal climatology, and the presence of any atmospheric circulation. Understanding the spatial extent of regions of coherent extreme precipitation is necessary for risk assessment of natural hazards. Through a combination of network measures, viz., degree and betweenness centrality, we were able to identify the different regions of the U.S. which exhibit distinctly different extreme precipitation dynamics. While the analysis of the spatial patterns of degree differentiated between extreme precipitation variability of the northwest and the southwest coast based on the associated large-scale atmospheric circulation, the high betweenness along the entire western coast brought to light the role of atmospheric rivers and that of topographic barriers in causing extreme precipitation. The network measures also demarcated the "Tornado Alley" (Concannon and Brooks 2000;Bluestein 2006) region in the Great Plains where tornadoes are more frequent. The high degree pattern captured the southwest-northeast (SW-NE) inclination (Najibi et al. 2020;Lukens et al. 2018) of extreme precipitation due to modulation of storms by the Rocky mountains. Similarly, a modulation of extreme precipitation due to other high ranges, such as the western Cascades and the Appalachians in the east of the country, was also reflected in the network connectivity.
Our complex network-based approach provides a comprehensive overview of the distinct regions which experience spatially coherent extreme winter precipitation in the U.S. albeit due to various climate processes. The similarity measure used in this study, the edit distance, comes out as a very promising alternative to studying extreme precipitation patterns in regions exhibiting very intricate climate variability, such as the U.S. Future work may include studying the effects of increasing intensity of extreme precipitation for different large-scale monsoon systems and to possibly identify teleconnections.
Author contribution AB and NM developed the theoretical formalism. AB carried out the experiment. MK and BG helped with result interpretation. BM and JK closely supervised the work. AB took the lead in writing the manuscript. All authors provided critical feedback and helped shape the research, analysis and manuscript.
Funding Open Access funding enabled and organized by Projekt DEAL. This research has been funded by the Deutsche Forschungsgemeinschaft (DFG) within graduate research training group GRK 2043/1 "Natural risk in a changing world (NatRiskChange)" at the University of Potsdam.

Data availability
The data/reanalysis that supports the findings of this study are publicly available online: ERA5 Reanalysis data (Hersbach et al. 2020), https://cds.climate.copernicus.eu/ and JRA-55 (Japan Meteorological Agency 2013) reanalysis data.
Code availability The readers are requested to contact the corresponding author Abhirup Banejee regarding code and analysis.

Declarations
Consent to participate Not applicable.

Competing interests The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.