1 Introduction

1.1 Motivation

Coronavirus disease 2019 (COVID-19) has caused over 1,100,000 deaths and more than 100 million infections in the United States and over 6.9 million deaths and more than 750 million cases worldwide [17, 61]. Although vaccines help mitigate the harmful effects of the disease, in 2020 and early in 2021, non-pharmaceutical interventions (NPIs) were the primary method to protect individuals from exposure to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes COVID-19. These interventions included formal shelter-in-place rules and guidelines, facility closures, limited seating in restaurants, reduction of interactions through physical distancing (i.e., ‘social distancing’), and travel-restriction policies to reduce mobility and transmission [20]. Such NPIs were generally effective [57].

In the United States, policies that invoke NPIs are typically administered at the state level. This has resulted in friction in local communities that seek to impose different (and often more stringent) standards than other areas of their state [3, 16, 29, 42]. To avoid spillovers of COVID-19, people were strongly advised (or even required) to stay within a local region for their daily activities [18, 41, 69]. However, when a local region (e.g., a metropolitan area) spans multiple states or experiences different risk levels than the rest of a state, it may be subject to conflicting policies and guidelines. To obtain administrative units for which it is reasonable to apply homogeneous NPI policies, we seek to construct regions that capture core geographies of social and movement behavior. We expect the spread of COVID-19 across these regions to be less pronounced than its spread across states [68, 77].

1.2 Background and related work

The objective of defining ‘functional’ geographic regions that may not follow administrative boundaries is not new [14]. It has deep theoretical roots in regional science, economic geography, and human geography [48, 60, 62]. Defining regions that are based on news markets, vacation trips, telecommunications, commutes, and migration [13, 33, 36, 54] has been a common practice for decades [25, 26, 33, 39]. More recently, trips from mobile phones and Global Positioning System (GPS) traces, flights, and social-media relationships have been used to define regions [12, 35, 43, 47, 49, 51, 60]. Regardless of the data source, such constructed regions have rarely been implemented in practice for policy purposes.

The COVID-19 pandemic has elicited new arguments for the use of functional regions for policy implementation [1, 6, 15, 32] and new computational experiments to delineate such regions and test whether or not their internal populations experience similar COVID-19 case rates over time. Hou et al. [43] divided two Wisconsin counties into regions using the WalkTrap community-detection algorithm on SafeGraph mobility data. These regions yielded effective boundaries for COVID-19 transmission, with about half of the infections occurring within the regions. Using SafeGraph trip data in California, Chang et al. [18] derived effective regions using a method that was based on minimum k-cuts. In another recent paper, adams et al. [1] defined mobility regions in Colorado using movement data. They concluded that their constructed regions often aligned with the regions in Colorado’s county-based ‘jurisdictional zones’ for COVID-19 policy administration, but with misalignments that may be useful to evaluate potential changes to these regions. Buchel et al. [15] derived regions from SafeGraph data (at the level of census block groups in the U.S.) by detecting communities with modularity maximization. They observed that these regions often cross state borders.

Several researchers have observed that functional regions often persist substantially over time. Using Facebook movement patterns in the United Kingdom, Gibbs et al. [32] detected regions using the InfoMap community-detection algorithm. They found that regions evolved with time but did not change significantly after local authorities invoked NPIs. Using the same data set, Schindler et al. [67] derived communities that generally followed administrative regions but were smaller during periods with travel restrictions. In a study of commute-based regions in Austria, Iacus et al. [44] observed similar within-region rates of COVID-19 infections from week to week, including weeks with lockdown events.

Some models to forecast disease incidences in different geographic areas, such as the GLobal Epidemic and Mobility (GLEaM) model [8], incorporate commuting and flights to simulate connectivity between spaces that are not geographically adjacent. In the context of COVID-19, the GLEaM model has been used to estimate retroactive pathways of transmission that occurred before testing strategies were in place [24].

1.3 Capturing geographic disease dynamics

To assess the ability of functional geographic regions to capture cohesive areas with high COVID-19 case rates, it is desirable to know the transmission patterns of SARS-CoV-2. However, modeling the transmission of COVID-19 infections in networks of individuals is complicated by asymptomatic transmissions and other factors [5]. Phylogenetic strains of SARS-CoV-2 indicate that the virus’s subsequent mutations, such as the Delta variant, initially tended to stay within concentrated geographic regions [38]. However, as mutated variants of SARS-CoV-2 propagated, geographic transmission paths became too widespread to pinpoint.

Contact-tracing technologies that record geographic traces of infected individuals [27] have had mixed results because of underdeveloped technologies, uneven participation levels from individuals, and lack of administrative organization and oversight [42]. Despite a lack of information on the precise spatial transmission of SARS-CoV-2, one can assess how different sets of regional boundaries act as informal barriers to disease transmission. We posit that regions that one obtains from human behavior may help explain the spatiotemporal landscape of COVID-19 case rates (as in [43]).

1.4 Our approach

We investigate the extent to which boundaries that are based on five different human-network regions are able to ‘contain’ COVID-19 cases more effectively—with lower COVID-19 case rates and smaller case counts between regions—than state boundaries in the coterminous United States. We construct the human-network regions by detecting communities in five county-level networks (commutes, GPS-based trips, migration, Twitter connections, and Facebook friendships). The state boundaries correspond to the 48 coterminous states and Washington, D.C., yielding 49 total entities. Our results include (1) descriptive statistics of COVID-19 dynamics (cases, mutual case rates, and case-rate differences) between and within different types of regions, (2) a comparison of actual COVID-19 dynamics in our constructed regions and states to those of a random model of geographically-contiguous regions, and (3) an examination of temporal coordination within regions using Granger causality.

We expect to obtain large case rates within our functional geographic regions, with low transmission activity across regional boundaries. We also investigate whether case rates are more homogeneous within regions than between regions. Because cohesive metropolitan areas often straddle borders, we posit that the region boundaries that we construct from human-mobility dynamics will capture natural disease-transmission bottlenecks more effectively than social-media-based regions or administrative boundaries such as states.Footnote 1

By determining functional geographic regions for the management of the spatial transmission of a disease, we suggest flexible alternatives to using states as administrative units for policy implementation (as also articulated in [18]). Because these proposed alternatives are based on human behavior, they can help limit disease transmission while permitting some natural activity (such as social visits and travel).

1.5 Outline of our paper

Our paper proceeds as follows. In Sect. 2, we discuss the COVID-19 case data sets that we use in our study, our human-behavior networks, and our methods of analysis. In Sect. 3, we describe our results, which detail the types of regions that have the least COVID-19 spread across boundaries, and obtain a set of consensus regions. In Sect. 4, we summarize our work, discuss the implications of our work in the context of implementing regions for public policy, and describe limitations of our work. In the Appendix, we give more information about the similarities between networks, the similarities between their associated regions, and the results of various community-detection methods. We also show maps of our regions. We provide an online tool to explore consensus regions at https://doi.org/10.6084/m9.figshare.14071439.

2 Data and methods

2.1 Data sets and data preparation

We construct regions from five data sets that encode different types of interactions between people in the 3108 counties (i.e., nodes) of the coterminous United States. From each of these data sets, we construct an associated weighted network. We also use an unweighted county-adjacency network \(G_{a}\) and associate COVID-19 case data with the edges of \(G_{a}\). We treat the five networks that we use to create the regions as independent variables, and we treat the COVID-19 data on the edges of \(G_{a}\) as an outcome variable. We only consider COVID-19 case data across counties that are geographically adjacent.

See Fig. 1 for a schematic illustration of our approach.

Figure 1
figure 1

Schematic illustration of our approach to obtain human-network regions through network partitioning. Each network-partitioning method has an input of (A) a network of movement flows or social-media connections between U.S. counties. We apply a community-detection algorithm to determine (B) a set of distinct regions. We use (C) a network \(G_{a}\) of county adjacencies and (D) distinguish edges between regions (\(E_{b}\), in yellow) from edges within regions (\(E_{w}\), in black). (E) We then weight all edges by COVID-19 case counts, mutual case rates, and case-rate differences. (F) We measure these values both between regions (in yellow) and within regions

2.1.1 Movement and social-network data

In each of the five human-behavior networks, a node represents a county and an edge signifies some type of mobility or social-media connection between two counties. In Table 1, we summarize basic statistical properties of these networks and the county-adjacency network. In each of the human-behavior networks, we weight the edges by the numberFootnote 2 of interactions between people in pairs of counties. The edge weights are sums of bidirectional flows (for movement networks) or connections (for social-media networks) between two counties. We allow self-edges, which we weight based on the number of interactions with origins and destinations in the same county. Our Twitter and Facebook networks do not include all counties, as some counties’ populations do not have associated accounts or activity on these networks.

Table 1 Basic statistics of our networks for the coterminous United States

We obtain commute data from the U.S. Census LODESFootnote 3 data set of residence–workplace characteristics for the year 2015 [76]. Each flow represents commutes from home to work at the census block level. We obtain migration data from American Community Survey (ACS) estimates of county-to-county migration flows for a 5-year period (2013–2017) [75]. The flow estimates approximate the annual numbers of movers between counties for the 5-year period of the data.

We obtain GPS trace data for January and February 2020 from SafeGraph [66]. The origins of the mobile-phone traces are census block groups,Footnote 4 and the destinations are points of interest (PoIs) at which travelers end a trip. We track the origin county (i.e., the county that contains the census block group) and the destination county of each trip. (We do not track intermediate counties.) Each trip is associated with a flow from one county to another (or is an internal trip within a county). We use data from 1 January 2020 through 29 February 2020 because they are recent months with business-as-usual (and pre-pandemic) movement landscapes.

To obtain social-media regions, we use data from Facebook and Twitter (which is now called \(\mathbb{X}\)). We use Facebook’s Social-Connectedness Index (SCI), which is the number of Facebook friendships between accounts in two counties divided by the product of the numbers of accounts in those counties [7]. The Twitter data consists of accounts with reciprocal mentions (i.e., ‘co-mentions’) between 1 January 2014 and 31 December 2015. We obtained reciprocal account pairs from geolocated tweets that we collected using the Twitter Streaming API [74]. Although co-mentions do not imply personal ties between Twitter users, reciprocal mentions between two accounts do indicate personal communications and possible interpersonal relationships [49].

In Table 6 in the Appendix, we indicate the correlations between the human-behavior networks.

2.1.2 Assigning COVID-19 cases using a county-adjacency network

We obtain COVID-19 case counts from The New York Times COVID-19 API [59]. We use data from the week ending 31 May 2020 through the week ending 1 May 2022. To determine the case rates per county, we obtain 2018 population data by county from the U.S. Centers for Disease Control and Prevention (CDC) [17].

To examine local SARS-CoV-2 transmission, we create a county-adjacency network \(G_{a}\). The nodes of \(G_{a}\) are the individual centroids of the 3108 counties in the coterminous United States. Each undirected edge of \(G_{a}\) connects geographically-adjacent counties (i.e., counties that share a physical boundary). There are 9120 edges in total. We represent COVID-19 cases between adjacent counties by calculating case counts (C), mutual case rates per 1000 individuals (CR), and case-rate differences (CD). We assign these values to each edge of a network as follows. The case count C of a pair of counties (i.e., nodes) is the sum of their numbers of cases. The mutual case rate CR of two counties is equal to the sum of the case counts of the counties multiplied by 1000 and divided by the sum of their resident populations. The case-rate difference CD between two counties is equal to the difference between the individual case rates of those counties. We put more credence into mutual case rates and case-rate differences than into case counts because (1) cases are population-dependent and (2) our case counts can overcount cases. Placing case-count data on edges counts COVID-19 cases multiple times when a node participates in multiple edges.

We use all 3108 counties in the coterminous U.S. as nodes when constructing regions. However, when we examine the COVID-19 statistics of these regions, we omit five counties (New York, Queens, Kings, Bronx, and Richmond) that correspond to the five boroughs of New York City, as these counties are not included in The New York Times COVID-19 data set. These nodes participate in only seven total edges, so we omit them in our statistical calculations.

2.2 Constructing regions

2.2.1 Regional delineation using community detection

We detect communities in each of the five human-behavior networks. A ‘community’ of a network is a dense set of nodes that is connected sparsely to other dense sets of nodes [58]. We obtain different numbers of regions and different community assignments of counties for different community-detection methods. We use community detection to obtain hard partitions, so we assign each county (i.e., each node) in a network to exactly one community.

We measure the quality of our network partitions by calculating the modularity [28, 64] of these partitions. The modularity of a partition of a network is \(Q = \sum _{\ell}(e_{\ell m} - {b_{\ell}}^{2})\). The quantity \(e_{\ell m}\) is the fraction of a network’s total edge weight that connects communities and m, and \(b_{\ell} = \sum_{m} e_{\ell m}\) is the fraction of the total edge weight that is in or attached to community . The maximum value of modularity quantifies the amount of compartmentalization of a network [28, 64]. One expects Q to be large for a network partition with few edges or small total edge weight between its communities. We examine five different community-detection algorithms. We use the Louvain locally greedy method for modularity maximization [11], an old greedy method for modularity maximization [21], InfoMap [65], and WalkTrap [63] in the software package igraph (version 1.3.5) in the R computing environment [22]. (In igraph, the methods have the names cluster_louvain, cluster_fast_greedy, cluster_infomap, and cluster_walktrap, respectively.) We also use the REDCAP algorithm, which partitions a network into communities using a spatial minimum spanning tree [37]. Our main results use communities from the Louvain method, as this method yielded the largest values of maximized modularity \(Q_{{\mathrm{max}}}\). We show these modularity values in Table 2. We summarize our community-detection results for all five approaches in Table 9 in the Appendix.

Table 2 Basic summary statistics of our constructed regions. We give the number \(n(r)\) of geographic regions, the maximized modularity \(Q_{{\mathrm{max}}}\), the total length d of the internal boundaries, the number \(E_{b}\) of edges between regions, the number \(E_{w}\) of edges within regions, and \(d/E_{b}\)

2.2.2 Geographic random regions

To supplement our comparison of the five human-network regions to states, we construct 1000 sets of geographic random regions. Each set has 44 polygons. The number 44 is the closest integer to 43.83, which is the mean number \(n(r)\) of regions of the human-network regions and the states. See Table 2 for all values of \(n(r)\). To construct these regions, we first select 44 county centroids (i.e., nodes of the county-adjacency matrix \(G_{a}\)) uniformly at random from the set of counties. We then generate a Voronoi diagram from these 44 county centroids; this diagram covers the coterminous U.S. with 44 Voronoi polygons. We assign county centroids to the same region if they are in the same Voronoi polygon. We repeat this process 1000 times (i.e., for 1000 sets of 44 randomly-generated centroids). This yields 1000 sets of geographic random regions; in each set, each node belongs to one of the 44 regions. We report mean values of our calculations across these 1000 networks.

2.3 Methods for statistical analysis

2.3.1 Statistics and permutation tests for COVID-19 cases

We report statistics for case counts C, case rates CR, and case-rate differences CD for the five human-network regions, the states, and the geographic random regions. We then perform permutation tests in which we shuffle the edge labels (i.e., whether they are within-region edges or between-region edges) uniformly at random. For each permutation and for the real data, we then sum the case values (either C, CR, or CD) over the within-region edges. We run the permutation test 1000 times and thereby produce a distribution of sums for within-region edges. We compare this distribution to the actual sum of case values for within-region edges. We perform a separate permutation test for each of the three types of case values and for each region type.

2.3.2 Granger-causality tests for case rates

We examine Granger causality to assess whether or not the time series of COVID-19 case rates of a county successfully infers the time series of COVID-19 case rates of adjacent counties. A Granger-causality test produces a p-value for the null hypothesis that the COVID-19 case rate of a county does not improve inference of the COVID-19 case rate of an adjacent county using lagged values of the case rates. Because many public tracking services of COVID-19 data employ 7-day moving averages (e.g., the Georgia Department of Public Heath [31] and The New York Times [59]) and the CDC reports case data and related data in weekly intervals [17], we use a lag of one week.

Disease transmission can occur in either direction (or in both directions) between adjacent counties, so we calculate Granger causality twice for each pair of counties by switching the dependent-variable and independent-variable roles of the two time series in a test.

We perform our analysis in Esri ArcGIS and the R statistical computing environment.

3 Results

3.1 Constructed regions

We use the Louvain method [11] of maximizing the modularity objective function to detect communities and create regions in our five human-behavior networks. Of these five networks, the commute network yields the most regions (with \(n(r) = 75\) regions), and the Twitter and migration networks yield the fewest (with 26 and 28 regions, respectively). See Table 2 for basic statistics of our networks, Fig. 2 for visualizations of state and human-region boundaries, and Fig. 1 for an illustration of our pipeline to examine case counts, case rates, and case-rate differences between and within regions. The commute network and GPS-trip network result in the largest values of maximized modularity \(Q_{{\mathrm{max}}}\). We also detect communities in the networks from the geographic random model. In the geographic random model, there are 1000 different sets of regions, with 44 distinct regions in each network. For this model, we report mean values of the numbers of edges between and within regions.

Figure 2
figure 2

State boundaries and five human-region boundaries in the coterminous United States. We algorithmically detect the human-region boundaries from human-behavior networks using the Louvain method [11] of modularity maximization. We show the numbers of regions in parentheses

We use the county-adjacency network \(G_{a}\) to track when pairs of adjacent counties are assigned to the same region and when they are assigned to different regions. We denote the total number of edges that cross between two regions by \(E_{b}\), and we denote the total number of edges that remain within a region by \(E_{w}\). (The sum of \(E_{b}\) and \(E_{w}\) is 9120.) Because the geometry (specifically, the area and shape) of the regions and the numbers \(n(r)\) of regions are different in each network, some sets of regions provide more opportunities for crossings. The number \(n(r)\) of regions correlates both with the length d of the internal boundaries and with the number \(E_{b}\) of between-region crossings. The Pearson product-moment correlation coefficients are \(f(E_{b},d) \approx 0.986\),  \(f(E_{b}, n(r)) \approx 0.999\), and \(f(d, n(r)) \approx 0.997\). The ratio \(d/E_{b}\) is the length (in kilometers) of the internal boundaries per between-region crossing. We calculate that \(d/E_{b}\) is roughly 30 kilometers (see Table 2).

3.2 COVID-19 cases between and within regions

We discuss mutual case rates (which we denote by \(\mathrm{CR}_{b}\) for between-region edges and by \(\mathrm{CR}_{w}\) for within-region edges) and case-rate differences (which we denote by \(\mathrm{CD}_{b}\) for between-region edges and by \(\mathrm{CD}_{w}\) for within-region edges) on edges. We report case rates as cases per 1000 individuals.

3.2.1 Region-type variation in case counts, case rates, and case-rate differences

We first measure the COVID-19 case counts between regions (\(\mathrm{C}_{b}\)) and within regions (\(\mathrm{C}_{w}\)). We expect to obtain larger case counts for region types (e.g., migration regions) with larger regions. The commute regions, Twitter regions, and migration regions have the largest differences between within-region case counts and between-region case counts (see Table 3), suggesting that these types of partitions effectively demarcate locations with large case counts. The commute regions have the largest within-region case counts, followed by the Twitter regions and then the migration regions. The case rates between regions (\(\mathrm{CR}_{b}\)) are lowest for the commute and trip regions (indicating a low penetration of cases per capita across the boundaries) and are highest for the Facebook and migration regions. The case rates within regions (\(\mathrm{CR}_{w}\)) are highest for commute and trip regions, and they are lowest for the Facebook regions.

Table 3 Mean values of COVID-19 case counts (\(\mathrm{C}_{b}\) and \(\mathrm{C}_{w}\)), case rates (\(\mathrm{CR}_{b}\) and \(\mathrm{CR}_{w}\)), and case-rate differences (\(\mathrm{CD}_{b}\) and \(\mathrm{CD}_{w}\)) between and within regions, along with the differences (ΔC, ΔCR, and ΔCD) in these values. The difference is positive when a between-region value is larger, and it is negative when a within-region value is larger. The rightmost column is an odds ratio. The case data spans the week ending 31 May 2020 through the week ending 1 May 2022. The values of the COVID-19 case data are means of the weekly values. It is desirable for case rates (respectively, case-rate differences) to be large (respectively, small) within regions and to be small (respectively, large) between regions

The case-rate differences within regions (\(\mathrm{CD}_{w}\)) are smallest for commutes and second smallest for states (see Table 3), indicating that counties in the same region for these two types of networks have similar case rates. For case-rate differences between regions (\(\mathrm{CD}_{b}\)), where larger values indicate more case-rate heterogeneity, we find that the migration regions and states (followed by the commute regions) are the most effective demarcators. The Facebook and trip regions are the least effective human-network partitions with respect to \(\mathrm{CD}_{b}\). The large differences in case rates across states seemingly suggest that states are more effective partitions than we posited initially. The geographic random model has the least pronounced differences in COVID-19 case counts, case rates, and case-rate differences between versus within regions, indicating that the regions in the geographic random model do not effectively demarcate different regions of COVID-19 cases.

3.2.2 Odds ratios for case counts

The COVID-19 case count on an edge (which we denote by \(\mathrm{C}_{b}\) for between-region edges and by \(\mathrm{C}_{w}\) for within-region edges) is sensitive to the number of potential case crossings between regions. To account for this, we calculate the odds ratio \(\frac{(\mathrm{C}_{b}/\mathrm{C}_{w})}{(E_{b}/E_{w})}\) to estimate the ratio of the case count between regions to the case count within regions. The odds ratio conveys the likelihood that cases cross regions. This ratio is largest for the Facebook regions, second largest for the geographic random model’s regions, and third largest for the states. By contrast, commute and trip regions have the smallest ratios (see Table 3). These results illustrate that human-movement regions are the most effective of the examined regions. Moreover, the regions that we create using migration data or even Twitter co-mentions are more successful than states at delineating areas with large COVID-19 case counts.

3.2.3 Statistical tests

We now test for statistical significance in COVID-19 case counts, mutual case rates, and case-rate differences. Our permutation tests indicate that almost all sets of regions have larger case counts within regions (\(\mathrm{C}_{w}\)) and smaller case-rate differences within regions (\(\mathrm{CD}_{w}\)) than one would expect if we had assigned the labels ‘within region’ and ‘between region’ to edges without considering geography (see Table 4). The values of \(\mathrm{C}_{w}\) are largest within commute regions, second largest within Twitter regions, and third largest within migration regions. The values of the within-region case rates \(\mathrm{CR}_{w}\) are most significantly different from the distribution from the permutation test for commute regions and then states, Twitter regions, and trip regions. For the regions in the other networks, we do not observe a significant deviation from distributions from the permutation tests. The geographic random regions have the largest within-region case-rate differences \(\mathrm{CD}_{w}\). This is unsurprising, as we created these regions randomly instead of from human-behavior data. Of the human-network regions, the Twitter and migration networks yield the smallest within-region case-rate differences. Therefore, for these regions, adjacent counties in the same region tend to have similar case rates.

Table 4 Results of permutation tests across region types for expected and actual COVID-19 case counts, case rates, and case-rate differences

Our results illustrate that states may be somewhat effective at delineating regions based on COVID-19 case rates. Our tests of statistical significance also illustrate that commute regions effectively delineate regions and that states and Twitter regions perform better than we expected.

We now describe the results of our two Granger-causality tests [71] for each pair of counties. In these tests, we consider only case rates, as we want to capture population-normalized waves of COVID-19. Whenever both tests are significant for a pair of adjacent counties, we conclude that there is evidence of Granger causality of potential disease transmission between them. Effective regions have few statistically significant Granger causalities for between-region (\(\mathrm{CR}_{b}\)) pairs and many statistically significant Granger causalities for within-region (\(\mathrm{CR}_{w}\)) pairs.

In Table 5, we show the percentages of county pairs with a Granger-causality p-value of at least 0.001 for both within-region pairs and between-region pairs. At the 0.001 significance level, 30–50% of the between-region pairs are significantly coordinated temporally (i.e., they are Granger causal in at least one direction) and about 45% of the within-region pairs are significantly coordinated temporally (see Table 5). All types of regions have a similar number of pairs of counties that are coordinated temporally.

Table 5 Results of our Granger-causality and Kolmogorov–Smirnov (KS) tests

We use a Kolmogorov–Smirnov (KS) test [55] to produce a D-statistic, which we use to evaluate whether or not differences are significant. We find that pairs of counties in the commute regions and Twitter regions are significantly coordinated temporally.

3.3 Consensus regions

To develop policy, it is useful to have a single set of regions to enable the implementation of stay-at-home orders and other mobility-related NPIs that are consistent with the severity of local outbreaks. Our method to obtain consensus regions (see Sect. 2) results in 31 regions and a maximized modularity of \(Q_{\mathrm{max}} \approx 0.92\) (see Fig. 3). In the depicted consensus regions, the state boundaries are often preserved; this is convenient administratively.

Figure 3
figure 3

We construct consensus regions in the U.S. using an unweighted combination of the states and the regions that we obtain from four human-behavior networks. We do not include the Facebook regions in the consensus regions because they are not effective at demarcating COVID-19 cases. These consensus regions indicate areas of strong within-region connectivity and weak between-region connectivity. (We computed the depicted regions using Louvain modularity maximization in the software package Gephi (version 0.10.0) [9])

To allow policy makers to explore multiple scenarios for their communities, we have developed an online toolFootnote 5 that creates on-the-fly regions for state, commute, migration, and trip networks (because these networks produce the most effective COVID-19 regions in our study). Users can change the relative weightings of these input networks to customize regions. They can also download images of the resultant regions and export data (which indicates the region assignments of all counties).

4 Conclusions and discussion

We used human-mobility networks and social-media networks to construct functional geographic regions, which capture natural movements and social interactions. We then evaluated how effectively state boundaries and these regions capture natural boundaries in the geographic spread of COVID-19 infections. We found that states, which were the predominant regions for administering policies for COVID-19 mitigation, yield less effective boundaries than the regions that we constructed from a commute network. We also found that states are more effective than the regions that we constructed from social-media networks and more effective than a random model of geographically-contiguous regions.

It is reasonable that the regions from the commute network are effective. Human-mobility regions are anchored by metropolitan areas. This yields strong connections in urban centers and suburbs, with weaker connections in exurban areas. Consequently, mobility-based functional regions tend to have many COVID-19 infections within regions and relatively few cases between regions. This conclusion reflects well-known regional-science principles that commuters and movers tend to follow an urban hierarchy with anchor cities and peripheries [34, 39, 45]. A regional approach is helpful for examining the spread of diseases (such as COVID-19) that have scant geographic transmission statistics. Based on our findings, we suggest that it is important to explore consensus regions that are derived from human-behavior networks as ad hoc administrative areas for making policy decisions for COVID-19 and other infectious diseases.

Applying policies and messaging to county-based regions instead of states poses an administrative burden that requires coordination and cooperative legislating. Nevertheless, during the COVID-19 pandemic, U.S. governors created multi-state regions [53, 70] and local authorities in the United Kingdom enacted specialized policies at local levels, rather than at the national level [32]. There are also county-level coalitions in economic development (e.g., the longstanding 420-county Appalachian Regional Commission [4]), and the U.S. federal government issues severe weather warnings (e.g., tornado, fire, storm, hurricane, and wind advisories) at the county level. Local-level operations have also yielded improvements in a variety of health systems. For instance, several years ago, the U.S. Organ Procurement and Transplantation Network implemented county-level liver-transplant regions that are based on supply-and-demand optimization as an improvement over state-level regions [30]. Functional regions may also be useful for examining the practicality of proposed inter-county alliances. In our work, for example, we did not find any regions that resemble the proposed region of Greater Idaho [19]. Instead, our regions illustrate that counties in Oregon have few existing connections to counties in Idaho.

When implementing regions in health-related situations, it is important to consider local variations. Administrative and household-level responses to COVID-19 varied across U.S. states. For example, testing rates for SARS-CoV-2 infections were different in different states [72]. There were also stark differences in vaccination rates across areas for both political and accessibility reasons [78]. Notably, vaccine-uptake rates were lower for socially vulnerable populations (as defined by low socio-economic status, household composition, a lack of access to healthcare, and disability status) [10]. Mobility behavior during lockdowns also depends on factors such as socioeconomic status [52].

Our work has a variety of limitations, and it is important to highlight several of them. A key shortcoming is that our human-behavior data are not up to date. Our data were collected prior to the COVID-19 pandemic. Our mobility and social-media data predate the pandemic, so they may be misaligned with actual movement and information exchanges between counties. For instance, the migration data are from the period 2013–2017, the Twitter data are from 2014–2015, and the mobility data are from January and February 2020. Another shortcoming is that conducting our research at the county level entails a mismatch in granularity across the United States. Some counties have millions of residents and encompass large geographic areas, and other counties have few residents. Additionally, because it is difficult to detect the spatial transmission of SARS-CoV-2, we used rates in adjacent counties as a proxy for geographic transmission. However, we lack evidence of actual contagion events across these areas. Inevitably, one can also emphasize methodological limitations, such as in the choices of community-detection methods and other computations. For example, we made subjective choices of descriptive and inferential statistics, and one can certainly calculate other statistics to attempt to capture variations within and across regional boundaries.

In future work, we hope to account for heterogeneities in COVID-19 responses and NPI administration. We also plan to incorporate the temporal dynamics of spreading processes that arise from local and seasonal events—such as spring breaks from school, holidays, and large festivals [23, 56]—that we did not capture in our analysis. Events such as the lifting of lockdown policies are also important. Directly after a lockdown, increased human movement often is not associated with an increased spread of infections [2]. Indeed, functional geographic regions that one derives using data during lockdown periods have smaller areas than regions that one derives from data that one captures after lockdowns are lifted [67]. Extensions of our analysis can incorporate localized spikes in movements and differences across time to capture seasonal changes in regions. As suggested in [32], using data with finely-grained time resolution (such as real-time data) may help capture the flexibility and elasticity of boundaries.

It is also important to consider the spatial resolution of social-media data and other ‘non-traditional’ sources of disease-spread data [50]. We performed our analysis at the county level, but a similar analysis at other scales (such as the neighborhood scale) likely would yield different results. Constructing functional geographic regions on different scales may reveal how regions change, agglomerate, shrink, and expand with time.