2.1 Overview

A major feature of our study is that we use new methods for measuring residential segregation that make it possible for us to assess levels and trends in segregation with consistent accuracy and across a wider range of measurement circumstances. This includes combinations of group comparisons and community settings where trustworthy measurements of segregation previously have not been possible. More specifically, we measure the dimension of segregation known as evenness using refined versions of two familiar and widely used measures, the dissimilarity index (D) and the separation index (S).Footnote 1 The versions of the measures we use are free of index bias, a problem that poses major challenges for measuring segregation in many situations, and thus yield index scores that are accurate and trustworthy in situations where scores obtained using conventional approaches to measuring segregation used in previous studies would be distorted by index bias. In the past, the problem of index bias forced researchers to choose between two undesirable options. One option is to measure segregation across a more comprehensive and representative range of circumstances but with an understanding that the index scores obtained are in many cases untrustworthy and potentially misleading because they are distorted by bias. The other option is to restrict the scope of the analysis to a much smaller and less representative set of combinations of group comparisons and community settings where index bias is likely to be negligible and scores for standard versions of segregation indices are trustworthy and can sustain close analysis of cases. The measurement methods we use make it possible for us to sidestep these difficult choices and avoid the undesirable consequences that accompany them. The benefit of using these new measurement methods is that we are able to obtain segregation index scores that are consistently accurate across a much broader range of measurement circumstances (e.g., combinations on group comparisons and community settings) than has been possible in previous research. The consistent accuracy of the unbiased measures enables us to draw conclusions about the levels and patterns of variation in segregation across group comparisons, across communities, and over time with greater confidence. Additionally, it allows us to selectively conduct close analysis of index scores for individual cases including, for example, tracking changes in segregation over time for a small subpopulation (e.g., Latino immigrants) in a small nonmetropolitan community, an analysis that cannot be sustained with conventional measurement practices used in past research. The task we seek to accomplish in this chapter is to first provide an overview of the conceptualization of residential segregation and the motivations for studying it and to then highlight the features of our study design that enable us to make new and important contributions to research on this topic.

We organize the discussion in this chapter as follows. We first review the broad concept of residential segregation, the research concerns that motivate our study, the aspects of segregation that are most relevant for our research concerns, and the implications this has for choices for measuring segregation. We then review the basic features of our study design including the community-level study units, the micro-level units, the group comparisons we examine, and our coverage of group comparisons across communities. Then we identify the sources of data we use to measure segregation and the spatial units we use for measuring segregation within individual communities. Finally, we review the issues of measuring residential segregation, focusing on two major points. First, we give attention to the details of how index scores are calculated. Second, we describe how we use new methods to obtain index scores that are superior to those used in past research because they are free of index bias and note how we use different indices to measure different aspects of segregation. Methods are crucial to any empirical study and, as the saying goes, the devil is often in the details. But thorough discussion of the details of measurement tends to be dry and tedious. So, we try to keep the discussion in this chapter relatively brief and refer readers to Fossett (2017) for more detailed reviews of the issues involved.

2.2 What Is Residential Segregation and What Motivates Us to Study It?

Massey and Denton characterized residential segregation as “the degree to which two or more groups live separately from one another, in different parts of the urban environment” but recognized that it is more complex on closer consideration because “groups may live apart from one another … in a variety of ways” (1988:282–283). Accordingly, researchers view residential segregation as having multiple dimensions that together encompass the variety of ways in which groups can be differentially distributed across spatial locations in a community, giving rise to varied patterns, potentialities, and consequences (Stearns & Logan, 1986; Massey & Denton, 1988). That said, it is safe to say that the literature on residential segregation of racial and ethnic groups is primarily motivated and guided by concerns about aspects of segregation that are directly and indirectly associated with group inequality across many domains. Thus, Massey (1990:333), Orfield and Lee (2005), Peterson and Krivo (2010), Quillian (2017), and many others have argued segregation warrants sustained attention from social scientists because it carries the potential to separate racial groups across different neighborhoods in a manner that produces racial inequality in neighborhood conditions including, but not limited to, differential exposure to poverty, quality of schools and learning environments, crime and violence, and access to resources and opportunities for life chances and social mobility.

We recognize that segregation can involve patterns that are sociologically interesting apart from their connection with stratification-related aspects of spatial distribution. But our study is not motivated by concerns about these more “benign” aspects of group differences in spatial distribution. Instead, we focus on residential segregation because the pronounced and enduring patterns of segregation seen in communities across the United States are often centrally implicated in social stratification processes and outcomes at the individual and group levels. We are hardly unique in this regard as concerns about stratification-related aspects of segregation motivate many, perhaps most, of the large number of empirical studies investigating residential segregation by race and class. But we call attention to this basis for focusing on segregation because some measures of segregation serve better than others for identifying when aspects of segregation that are most relevant for stratification and inequality are present. Specifically, of the dissimilarity index and the separation index, two of the most widely used measures of evenness, the separation index is clearly better for the purpose of identifying when group differences in distributions across neighborhoods create the potential for majority-minority inequality on advantages and disadvantages associated with neighborhood of residence.

2.3 Preliminary Comments on Index Choice

We explain our views on index choice in more detail later in this chapter. But, setting aside technical issues in segregation measurement for the moment, the heart of the matter is relatively simple and important to discuss now as part of understanding how segregation is conceptualized and operationalized. Separation of groups across different neighborhoods in the community is a necessary precondition for groups to experience systematic inequality on location-based outcomes. When separation is pronounced, groups live apart from each other in different parts of the residential environment of the community and group inequality on location-based outcomes becomes logically possible, and potentially empirically common and important. When separation of groups across neighborhoods is minimal, groups reside together in the same parts of the residential environment of the community and share similar neighborhood environments. Consequently, group inequality on location-based outcomes is not logically possible. We are interested in identifying when segregation takes the form where groups live apart from each other in different neighborhoods because this identifies communities where segregation creates the potential for group inequality on location-based outcomes to exist. The separation index provides a reliable signal of whether segregation involves this kind of pattern or not (Fossett, 2017).Footnote 2 The dissimilarity index does not.

We bring this point up early on in our discussion because the choice for how to measure segregation is highly consequential in this study. The findings we obtain using the separation index in some cases differ dramatically from the findings we obtain using the dissimilarity index. This is particularly true for findings regarding the level of segregation and nature of change in segregation over time in nonmetropolitan settings and also for Latino households in new destination communities. For example, analysis of scores for the dissimilarity index suggests White-Latino segregation initially emerges at medium levels when Latino households first begin to settle in communities where previously there was little or no Latino presence and then over time segregation begins to decline and converge on levels seen in communities where Latino presence is sizeable and well-established. Analysis of scores for the separation index suggest a fundamentally different story wherein White-Latino segregation emerges at very low levels when Latino households first arrive in new destination communities and then over time segregation increases and begins to converge on levels seen in communities with established Latino presence.

The literature on segregation measurement has for many decades noted that the dissimilarity index has significant conceptual and technical problems. But researchers have tended to overlook these problems for a variety of reasons. The measure has been widely used in empirical studies, so it is familiar and provides continuity with past research. It also is relatively easy to calculate, and many believe it has an appealing interpretation. Finally, researchers tend to not view the conceptual and technical limitations of the dissimilarity index as particularly concerning because some studies have reported that findings obtained using the dissimilarity index are often similar to findings obtained using other, technically superior measures. We consider these issues in more detail below. But we preview discussion relating to the last point by noting that there is no dispute regarding whether scores for dissimilarity can diverge from scores for other indices; they can and sometimes do. So, previous studies that reported obtaining results using the dissimilarity index that were similar to results obtained using other indices should be seen only as fortunate situations where index choice did not matter; they cannot be construed as establishing that index choice never matters. Fossett (2017) identifies both circumstances under which index choice is less likely to matter and circumstances under which index choice is more likely to matter. He also documents that the issue has practical importance by showing that in many sociologically interesting circumstances results obtained using the dissimilarity index can and often do differ dramatically from results obtained using other indices, particularly the separation index.

We find results for the dissimilarity index and the separation index often diverge in our study, sometimes by large amounts. When results for the dissimilarity index diverge from results for the separation index, we are hard pressed to see a basis for prioritizing results for the dissimilarity index over the separation index. To the contrary, our view is that, the more one understands about how it is possible for scores on the dissimilarity index and the separation index to diverge, the less confidence one will place in the dissimilarity index. We outline the basis for that review here. What we ask of researchers who have grown comfortable with relying on the dissimilarity index is this: Upon encountering the fact that other measures yield results different from those obtained using the dissimilarity index, please be open to reconsidering habits that, while familiar, are weakly justified. We believe doing so will lead to a better understanding of how results for the dissimilarity index and the separation index can and do vary and that in turn will provide a more informed basis for appreciating what we can learn about segregation using different measures.

2.4 Details of Study Design

As we noted earlier, segregation is a community-level phenomenon relating to how members of two groups are distributed across spatial subregions or neighborhoods within the community. Segregation indices provide quantitative scores summarizing particular aspects of group differences in residential distribution in the community. To support this study, we prepared a database of index scores to document the levels of segregation for particular group comparisons in individual communities at different points in time. We then performed statistical analyses to establish how segregation varies across group comparisons, across communities, and over time. As is necessary in any study of this nature, we had to make a variety of choices relating to research design and measurement. In this section we provide a brief summary and rationale for some of the most important choices.

Specifically, we review the main elements of our study design, including the data we use and the communities we examine. We devote a later section to an extensive discussion of measurement in order to provide a clear and thorough technical basis for justifying our choices. But we do not intend for this chapter to be a detailed technical introduction into new methodologies for segregation measurement. One of the authors of this work has published a monograph (Fossett, 2017) that provides a detailed review of the relevant technical issues. We will draw on the central points of those technical discussions and clarify how the issues matter for our goals of conducting a study of levels and trends in residential segregation. At certain points, however, we will refer readers to this earlier work for technical details, noting that it is published as open access and so can be obtained as a free download from the publisher’s website.Footnote 3

2.4.1 Measuring Segregation in Metropolitan Areas, Micropolitan Areas, and Noncore Counties

One of the contributions of our study is that we examine segregation across the full range of communities in the United States. Specifically, we measure segregation for nearly all metropolitan areas, micropolitan areas, and noncore counties which taken together covers all of the United States. Metropolitan areas and micropolitan areas are core-based statistical areas (CBSAs) as defined by the U.S. Census Bureau. Each CBSA is comprised of one or more contiguous counties whose populations are socially and economically integrated with an urban core of at least 10,000 inhabitants.Footnote 4 CBSAs with an urban core reaching or exceeding 50,000 in total population are designated as metropolitan. By definition, micropolitan areas are nonmetropolitan but they are not generally rural in character due to having a nontrivial urban core of at least 10,000 but less than 50,000 inhabitants. As the term implies, noncore counties are counties not associated with a CBSA. Not surprisingly, they often are rural in character since they do not have a significant urban core and generally have small populations and low population density. These three categories of communities – metropolitan CBSAs, micropolitan (nonmetropolitan) CBSAs, and noncore counties cover the entire land area and population of the United States. In all there are 960 CBSAs; 384 are metropolitan and 576 are micropolitan. We also measure segregation in the 1355 noncore counties that are not included in a CBSA.

Depending on researcher interest, residential segregation can be considered at various macro-level domains ranging from expansive spatial domains such as state and region, to intermediate-level spatial domains such as metropolitan and nonmetropolitan communities, and even down to subregions within communities (e.g., central city and suburban ring). Our interest is with relatively self-contained metropolitan and non-metropolitan communities where it is reasonable to view residential dynamics as playing out within a single broad housing market. We acknowledge that larger communities may have spatially segmented housing markets – central city and suburbs for example. But for our purposes, these lines of balkanization in housing markets are part of the broader dynamics that produce segregation for the community overall. We agree that differential segregation within spatially segmented subregions (e.g., segregation in suburban sub-communities in a metropolitan area) in a community is a valuable focus of study. But our concern here is with overall patterns of segregation across the full housing market for the community.

Empirical studies of segregation tend to focus on metropolitan areas and many influential studies have focused on only the largest 50–60 or the largest 100 metropolitan areas. These communities are important and deserve close attention. But we believe it is equally important to examine segregation in smaller metropolitan areas, in micropolitan areas, and in noncore counties as patterns of segregation in the largest metropolitan areas are not necessarily representative of patterns across the rest of the country. It is relatively uncommon for empirical studies of segregation to include micropolitan areas and noncore counties. One methodological reason for this is that researchers must use small spatial units such as census blocks to measure segregation in smaller communities, and this raises concerns that index scores will be distorted by index bias. This concern may apply to past research, but not ours because we use refined methods to obtain index scores that are free of index bias even when using data for households in combination with small spatial units. Another reason segregation research has neglected examining segregation in micropolitan areas and noncore counties is that some audiences and perhaps also some researchers question whether residential segregation is substantively important in smaller communities. On this point we can acknowledge that the patterns documented for White-Black segregation in Chicago, Cleveland, Detroit, Milwaukee, Newark, and other notorious cases are qualitatively and quantitatively distinctive due to the scope and scale of segregation patterns in these large metropolitan areas. Thus, we note that the term “hypersegregation” coined by Massey and Denton (1988), the pattern where segregation reaches high levels on many dimensions of segregation simultaneously, has only been applied to segregation patterns seen in a small set of the largest metropolitan areas where regions of concentrated minoritized group presence span many square miles in a manner that cannot occur in smaller communities.

At the same time, however, we do not hesitate to assert that residential segregation is an important marker of racial inequality in smaller communities and it carries important consequences for life chances in both the short and long run. Thus, while residential segregation may be less consequential for school segregation in small communities where students in public schools sometimes attend a single school, segregation may be more consequential for a host of other things including, as a brief list of examples: exclusion from police, fire protection, and ambulance service zones; exclusion from public water and sanitation systems; exclusion from service zones for utilities and maintenance of safe roads and drainage systems; exposure to natural hazards such as flooding; exposure to disamenities based on proximity to stockyards, sewage treatment plants, garbage dumps and landfills; and exposure to industrial emissions and waste products affecting air and groundwater.

Segregation in smaller communities often relegates minoritized group populations to less desirable, administratively neglected areas where residents are subject to disamenities and hazards and do not benefit from basic social and municipal services and effective enforcement of protective regulations. A particular, but not necessarily exceptional, example is seen in the colonia communities of South Texas border regions. Colonias are small, usually predominantly Latino, low-income residential areas in rural portions of medium-sized metropolitan areas, micropolitan areas, and noncore counties. Their populations often reside outside of city and county administrative boundaries – in many cases due to selective annexation practices – and as a result often have no potable water, no public sanitation, no police and fire protection services, poor roads and infrastructure, no public transportation, or other social services. Many are subject to having flooded homes and washed out roads during ordinary thunderstorms due to neglect in public works for flood control and road maintenance. Census tracts are too large to capture the populations of these residential areas because they often mix the populations residing in individual colonia settlements with populations residing in incorporated places or in suburban and exurban neighborhoods. In contrast, census blocks typically do not mix the populations residing in colonia settlements with other populations.

The measures of segregation we use, particularly the separation index (S) establish whether minoritized group populations live apart from the White majority population in the residential areas in smaller communities. Scores on S provide a reliable marker for the structural potential for group inequality on the area-based outcomes just mentioned as separation of groups across spatial units is a fundamental logical prerequisite for group inequality on location-based outcomes. In sum, there is no question that segregation can be pronounced in small communities and can have important consequences for life chances in those settings. Thus, our study makes a valuable contribution by using improved methods to document segregation in smaller communities.

2.4.2 Coverage Spanning Three Decades

Our study spans the time frame 1990 to 2010. We adopt the CBSA designations used in the 2010 Census of Population and apply the county-based community definitions for 2010 in 2000 and 1990. In sharp contrast to cities, county boundaries are highly stable over time. So, county-based definitions maintain consistent spatial definitions of communities over the study period. We reviewed counties to identify those that changed boundaries over time in a way that could potentially lead to a significant change in the spatial definition of a CBSA or a noncore county. Only 5 noncore counties were affected. and we excluded them from our analysis. Our coverage of all U.S. communities extends back from 2010 as far as is feasible with available data. The limiting factor is that block-level data needed to measure segregation in smaller metropolitan areas and nonmetropolitan areas is not available in the full coverage form needed before 1990. As we completed this study, the 2020 Census of Population was conducted. But, for a variety of reasons, the 2020 data we would need to extend this study was not yet available. So, we must defer extending this analysis to include data for 2020 to a later study.

2.4.3 Group Comparisons

We assess segregation patterns for three White-nonwhite group comparisons that can be maintained consistently over the 1990–2010 timeframe using public census tabulations for small areas. The specific groups we study are broad panethnic groups routinely considered in empirical studies of residential segregation in the United States. They are identified based on responses to separate census questions on race and Hispanic identity and include: Non-Hispanic White (hereafter simply “White”) households, Black (African American) households, Latino (per census terms and of all races) households, and Asian (Asian American) households. While we would prefer to do so, it is not possible to identify Black and Asian households separately as Hispanic and Non-Hispanic in all three study years when using data for households at the block level. Due to this limitation of the data, it is logically possible to have the same household represented in the group counts for Black and Latino households and in the group counts for Asian and Latino households. This could in principle pose problems for measuring Black-Latino and Asian-Latino segregation in some communities, and so we give limited attention to examining these comparisons in our study. The potential impact on measures of White-nonwhite group segregation is much smaller. We identified communities where this issue was a potential concern (i.e., based on high percentages of Latino persons among Black and Asian persons) and performed robustness checks by comparing results of analyses with these White-Black and White-Asian comparisons included and excluded to assure our findings for White-Black and White-Asian segregation were not significantly affected by these cases.

2.4.4 Combinations of Group Comparisons Across Communities and Time

The absolute and relative size of each group’s presence in a community varies across communities and can change over time in any single community. Many communities are diverse, with all four groups considered here being present in non-negligible numbers, and many others are less diverse. We apply minimum population requirements to include particular segregation comparisons for a community in any of the three decades. The primary filters we apply are to require that both groups in the segregation comparison have a minimum of 50 households and 150 persons in the community overall and to constitute at least 0.5 percent of the households in the segregation comparison. These thresholds are much lower than those typically used in previous research. We are able to adopt these more inclusive (less restrictive) criteria because we use new methods for measuring segregation that can provide accurate and reliable results for small subpopulations and small, nonmetropolitan communities. This is significant because it allows us to track segregation of new populations from the onset of their initial appearance as a small subpopulation in a community and then on into later decades when they may or may not become a larger presence in the community.

Most studies of segregation adopt much higher thresholds for screening cases. For example, many adopt requirements that both groups have a minimum of 3000–5000 persons and represent at least 3–5 percent of the community population. The restrictions would preclude the study of segregation in most nonmetropolitan new destination communities and most metropolitan ones as well. Importantly, these screening criteria are not adopted based on substantive concerns. Instead, the primary motivation for adopting these restrictions is that researchers correctly fear conventional practices for measuring segregation will yield misleading index scores when one group in the analysis is small in absolute or relative terms. Similar concerns do not apply to the refined measures we use. This is possible because we measure segregation using Fossett’s (2017) difference-of-means computing framework for computing segregation index scores which includes refinements that eliminate the problem of index bias when measuring segregation for small groups. A second advantage of this framework is that it formulates segregation index scores as a group disparity (i.e., a difference of means) on residential attainments, thus permitting calculation of standard errors and conventional tests of statistical significance of departure from the null hypothesis of no group difference. In principle, we could use even lower thresholds on population counts. But the limiting factor is that scores based on even lower counts would have high sample-to-sample volatility (large standard errors) under the null hypothesis of no group difference and thus low statistical power (i.e., limited ability to reliably detect true effects that are small-to-moderate in size). Nevertheless, we are able to use more inclusive criteria for screening cases and retain a much larger number of combinations of group comparisons across a larger number of communities.

2.4.5 Sources of Data and Microunits

The data we use for our study are drawn from census summary file tabulations of group counts across census blocks. More specifically, the main tabulations we use to obtain data on households and persons are taken from Summary File 1B and the PL-94 voter redistricting file from the 1990 Census, Summary File 1 for the 2000 Census, and Summary File 1 from the 2010 Census. These block-level tabulations provide complete coverage of the United States based on full (100%) counts, not sample data. The data sources for each year provide block-level counts of the number of persons and households for the groups considered in our study – White, Black, Latino, and Asian groups. We primarily focus on results obtained using data for households. But in a case study in Chap. 5, we also review results obtained using data for persons in order to establish methodological points and to have points of comparison with previous research which typically uses data for persons. We use these block-level data tabulations to calculate index scores that assess the level of segregation of households by race/ethnicity for individual communities in 1990, 2000, and 2010. In 2000 and 2010, the available data tabulations for households break counts for households out by household size. These data play an important role in methodological analyses we discuss later in this chapter. These methodological analyses establish that segregation index scores computed using data for households are superior to similar scores using data for persons because it is feasible to use methods for direct calculation of unbiased segregation index scores using data for households, while this is not feasible using data for persons. We provide a review of this issue later in this chapter.

Relatedly, the data sources we use also provide block-level counts of persons for the groups considered in our study. We use these data to calculate index scores that assess the level segregation of persons by race/ethnicity for individual communities in 1990, 2000, and 2010. To be clear, we computed and examined these scores for purposes of methodological analysis. We do not focus on results based on tabulations of persons in our main analysis chapters because segregation index scores calculated using person data are inferior to the index scores that we compute using data for households. Specifically, scores calculated using data for persons are distorted by inherent index bias to a much greater degree than is generally recognized by segregation researchers. Additionally, while it is technically possible to obtain unbiased segregation index scores calculated using data for persons under certain conditions, the methods for achieving this desirable result require detailed tabulations of persons by race and size of household that are not available for 1990. Additionally, even when requisite data are available, methods for obtaining unbiased index scores using data for persons are demanding and impractical for general adoption in empirical research. The procedures involve either applying complicated formulas to implement direct calculation methods per Fossett (2017) or they involve indirect norming procedures that require use of complex bootstrap simulation methods. Either approach is impractical because the scores obtained correspond closely to unbiased index scores that can be obtained by applying simpler calculations using data for households.

2.4.6 Spatial Units for Assessing Segregation Within Communities

We noted earlier in this chapter that we measure residential segregation using data for census blocks. Census blocks are the smallest spatial units for which relevant census tabulations are available. This brings a major advantage; namely, it makes it possible for us to measure segregation more accurately in nonmetropolitan communities such as micropolitan areas and noncore counties and also in smaller metropolitan areas. Many segregation studies in recent decades have used the much larger spatial unit of census tracts. This is probably not a concern in large metropolitan areas where segregation patterns play out across relatively expansive spatial domains that are adequately captured by census tracts. But census tracts are unacceptable units for studying segregation in smaller communities. For those who may have concerns about measuring segregation using census blocks, we offer the following reassurances.

  • We calculate scores for segregation indices using data for households at the block level using new, refined methods that yield scores that are unbiased at the initial point of measurement. The resulting scores require no adjustment to be used as point estimates of the level of segregation in the community. Additionally, the scores can be used as is (without adjustment) to sustain close analysis of individual cases – including, for example, obtaining standard errors and confidence intervals for the point estimate – and direct case-to-case comparison of any two or more scores. Significantly, the claims we make here cannot be made for the segregation index scores used in previous studies.

  • Segregation measured at the block level makes it possible for us to directly compare scores across all communities ranging from small noncore counties to the largest metropolitan areas. Index scores computed using block-level data and tract-level data will be similar for large metropolitan areas because segregation within tracts is a small fraction of the segregation in the community overall (Amaro, 2016). Index scores computed using tract-level data will be misleadingly and unacceptably low in smaller communities where the portion of overall segregation occurring within tracts is large and often exceeds the portion of segregation occurring across tracts (Amaro, 2016). Consequently, unbiased index scores based on block-level data are directly comparable across small and large communities. This is not true for scores based on data for tracts.

  • Segregation at the block level is sociologically relevant for group differences on a wide range of location-based outcomes and associated life chances. Segregation at higher spatial scales – for example, segregation across school districts, across urban places within large metropolitan areas, between central city and suburbs, and across counties, states, and regions – all can be a legitimate focus of research (e.g., Fischer & Massey, 2004). This, of course, does not diminish the relevance and potential importance of segregation at the block level within communities.

2.5 Segregation Index Bias: Overview, Background, and Solutions

A relatively small number of indices are used to measure the evenness dimension of segregation in empirical studies. These include the Gini index (G), the dissimilarity index (D), the Hutchens square root index (R), the Theil entropy-based (or information theoretic) index (H), and the separation index (S) (which is known by many other names).Footnote 5 Of these, we present results for D and S, which we discuss in more detail below, because they are the two best known and most widely used measures. Additionally, they both have relatively simple computing formulas and they both have attractive substantive interpretations that are easy to convey to broader audiences. Also, when considered together, D and S are effective in capturing two distinctive aspects of segregation registered by measures of uneven distribution. The first aspect is group differences in distribution across neighborhoods ranked at the most basic level as simply being “below-parity” or “at-or-above-parity.” The second aspect is whether displacement into non-parity neighborhoods involves group separation as occurs when uneven distribution is polarized, that is, when groups live apart from each other in different neighborhoods that are polarized on group composition.

The first aspect of uneven distribution is group inequality or disparity on the simple outcome of attaining parity-level contact with the reference group. The dissimilarity index captures this well because it is sensitive to all departures from even distribution whether large or small. More technically, D is equal to the index of net difference (ND), a measure of inter-group inequality on ordinal outcomes (Lieberson, 1976), applied to a three category neighborhood ranking of below-parity (p < P), parity (p = P), and above parity (p > P) where p and P are the reference group’s proportion in the area population and overall population, respectively (Fossett, 2017). The relevant point is this; D registers living in below parity areas in the same way whether the area is below parity to the maximum possible degree (100% minoritized group) or merely 0.1 point below parity. Scores for D correlate very closely with scores for G and R (r > 0.96 in our data) which also respond strongly to this aspect of uneven distribution.Footnote 6 The separation index measures quantitative group inequality on p, not ordinal inequality (Fossett, 2017). Accordingly, S is more effective at capturing the group separation aspect of uneven distribution because S is more sensitive to the larger departures from even distribution that occur when areas are racially polarized. Scores for S also correlate relatively closely with scores for H (r > 0.79 in our data) which is the next best alternative for assessing this aspect of uneven distribution (Fossett, 2017).

In short, measures of uneven distribution fall into two groups; those that are sensitive to rank-order differences on area group composition and those that are sensitive to quantitative differences. D represents the first group reasonably well,Footnote 7 while S represents the second group. We provide a more detailed review of how D and S register segregation patterns later in this chapter. Here we focus on a characteristic they both have in common with all measures of uneven distribution; they are inherently biased in the following sense – they have positive expected values under conditions where households are distributed randomly across residential locations. Significantly, the bias inherent in indices measuring uneven distribution is not “fixed” or constant; it varies in magnitude from one index to another and the magnitude of bias for any given index varies in complex ways across circumstances of measurement such as variation in the relative size of the groups in the comparison, variation in the size of spatial units, variation in the relative presence of other groups in the overall population, and variation in patterns of segregation involving groups not in the comparison of interest (Fossett, 2017).Footnote 8

The problem of index bias was first identified in the 1960s and 1970s and gained wide appreciation following an influential study by Winship (1977) which established that index bias for D was very high when segregation was measured using small spatial units and groups were imbalanced in size. Prior to this time, many empirical studies of segregation used index scores calculated using block-level data for households (housing units) from tabulations published in the reports of the decennial census of housing (e.g., Taeuber & Taeuber, 1965; Roof, 1972; Van Valey and Roof 1976; Sørensen et al., 1975). Spurred in part by concerns about the magnitude of index bias when counts for groups by spatial units are small, researchers moved from using data for census blocks to using data for census tracts which are much larger. This change does provide some protection from the most severe problems of index bias, but it imposes a great cost on segregation research. It effectively precluded the possibility of studying segregation in all nonmetropolitan settings and even in many smaller metropolitan communities where spatial units that are the size of census tracts are too large to accurately capture segregation patterns.

The change from using block data to using tract data also was accompanied by a second change from using data for households (housing units) to using data for persons (individuals). This change resulted in much larger group counts for spatial units. On first consideration this might be seen as providing protection from index bias. In fact, it does not. To the contrary, as we discuss in more detail below, the shift to using data for persons provides no benefit on index bias over using data for households. Furthermore, it can potentially do more harm than good if it leads some researchers to have a false sense of security regarding the undesirable impact of index bias.

The problem of index bias is widely recognized, but for decades it defied a viable solution. As a direct consequence, segregation researchers began to adopt a variety of ad hoc procedures for dealing with index bias by indirect means. Fossett (2017) reviews these procedures in detail and comes to a blunt conclusion. There is little rigorous evidence to indicate that the ad hoc procedures are effective in dealing with index bias. Evidence on this point would require methodological studies demonstrating that findings using biased scores with ad hoc procedures yield results comparable to those obtained using unbiased scores. But these studies do not exist. Instead, ad hoc procedures, while adopted with good intentions, are at best weakly justified. What the ad hoc procedures primarily accomplish is to restrict the focus of research to a subset of circumstances where bias is potentially kept to an acceptable if not negligible level. The most common ad hoc practices for dealing with bias are to exclude cases where scores are thought to be most distorted by bias and then to weight remaining cases differentially to give greater weight to cases whose index scores are viewed as less distorted by bias.

These procedures have three undesirable consequences. First, they dramatically restrict the scope of segregation studies because many research questions involve group comparisons that are typically excluded from consideration. For example, excluding cases prone to higher levels of bias precludes studying segregation in nonmetropolitan and rural settings where segregation must be measured using small spatial units and studying segregation of new groups that by their nature are small in absolute and/or relative size. The second negative consequence is that use of ad hoc case weighting procedures skews results of empirical analyses in the direction of results obtained for larger communities with larger minoritized group populations. This raises a concern that these cases are not representative of the much larger number of group comparisons that are excluded from empirical studies. Third, the ad hoc procedures create a false sense of security when in fact the index scores are subject to bias and, barring evidence to the contrary, are potentially untrustworthy for close analysis, thus making close comparison of scores across cases a questionable exercise.

Happily, concerns about the efficacy, or lack thereof, of ad hoc procedures can be set aside. New methods introduced by Fossett (2017) provide refined formulas for all popular indices of uneven distribution that yield scores that are unbiased across a broad range of measurement circumstances. The scores obtained using these refined unbiased formulas have many desirable properties. First and foremost, they have the crucial property of being unbiased; specifically, they have expected values of zero when groups are distributed randomly across residential locations. Second, because the scores are unbiased, scores for individual cases do not require screening or differential weighting to deal with distortions resulting from index bias. Third, unbiased scores can support close analysis of cases as comparisons across cases are no longer complicated by index bias. Fourth, the method for obtaining unbiased scores draws on formulas that express the indices as a simple group difference of means on the racial composition of a household’s neighbors. This has the desirable consequence of placing segregation index scores in a group disparity analysis framework where any given index score can be evaluated using statistical methods that permit calculation of standard errors and confidence intervals for individual index scores and, if desired, tests of statistical significance for departure from the expected value of zero under a null hypothesis of independence of group membership and relevant residential outcomes.Footnote 9 Finally, the unbiased scores support substantive interpretations that are appealing and easy to explain.

We note only one caveat to these otherwise encouraging points. It is that unbiased scores are relatively easy to calculate when using readily available tabulations of households by race for blocks or other spatial units. But unbiased scores are not easy to calculate when using similar tabulations for persons. Instead, calculations using data for persons require data that are more detailed in combination with more complicated calculation formulas.

2.5.1 A Somewhat Technical Review of the Origins of Index Bias

Our discussion here summarizes points from Fossett (2017) regarding the origins of index bias. The first step in Fossett’s analysis is to establish that all widely used measures of uneven distribution (i.e., G, D, R, H, and S) can be formulated in a simple difference-of-means disparity framework. In this framework the index score (IS) for any widely used measure can be obtained using a generic formula that computes a group difference of means on residential outcomes (y) experienced by the individual households in each group. The disparities formula for all popular indices is the following simple expression.

$$ IS={\overline{Y}}_1-{\overline{Y}}_2=\left(1/{N}_1\right)\cdot \Sigma\ {n}_{1i}{y}_i-\left(1/{N}_2\right)\cdot \Sigma\ {n}_{2i}{y}_i $$

To implement the formula one of the two groups is arbitrarily designated as the “reference group” in the comparison (indexed by “1” in the terms in the formula) and the other group is designated as the “comparison group” (indexed by “2” in the terms in the formula). In analyses of majority-minority segregation it would be conventional to adopt the majority group as the “reference group” in the comparison. But, ultimately, it is an arbitrary choice as the results of the calculation will be identical regardless of which group is designated as the reference group. The subscript i is an index for spatial units (e.g., census blocks); the terms n1i and n2i indicate the counts for the reference group and the comparison group in a given spatial unit i; and the terms N1 and N2 are the counts for the reference group and the comparison group in the community overall. To this point, all of the terms are the same regardless of which index is being calculated. The next term in the formula – yi – has a generic interpretation but is obtained via a unique calculation specific to the index being used. The generic interpretation of yi is that it is the value of a residential outcome related to area group composition (pi) that is scored separately for individual households. Fossett refers to this term as “scaled contact with the reference group” because yi is scored as a positive, monotonic (sometimes rising, never falling) function of the relative presence (proportion) of the reference group (pi) in area i obtained from the simple calculation pi = n1i/(n1i + n2i). In this framework, all measures of the uneven distribution dimension of segregation can be characterized as measures of group disparity on scaled contact with the reference group. The differences between particular measures trace to a single factor; namely, how values of “scaled contact with the reference group” (yi) are scored from simple contact with the reference group (pi).

Fossett (2017) derives the specific scaling functions – yi = f(pi) – needed to generate the scores of any widely used index of uneven distribution using the difference-of-means computing formula. Here we focus only on the scaling functions that are relevant for the two indices we will consider in our study; namely, the dissimilarity index (D) and the separation index (S). The scaling function for D is: score yi = 1 when pi ≥ P and yi = 0 when pi < P where P is the relative representation (proportion) of the reference group in the combined overall population of the two groups (i.e., P = N1/(N1 + N2)). In substantive terms, the scaling function for D recodes continuous scores on pi, the reference group proportion in the area where the household resides, into values of a dummy variable coded 0 or 1 based on whether the value of pi equals or exceeds parity (i.e., pi ≥ P) for the relative presence of the reference group in the community overall. Accordingly, the value of D obtained from the difference-of-means group disparity formulation \( D={\overline{Y}}_1-{\overline{Y}}_2 \) registers the group difference in achieving parity-level contact with the reference group.

The scaling function for S is also simple. For S, the function is yi = pi. Thus, the value of S obtained from the difference-of-means group disparity formulation \( S={\overline{Y}}_1-{\overline{Y}}_2 \) registers the group difference in simple (unmodified) contact with the reference group. Comparison of the scaling functions for D and S clarifies the essential difference between the measures. Where S registers group differences on contact with the reference group in their “raw”, unmodified form, D registers group differences in contact with the reference group after first collapsing them into two values, 0 or 1, based on whether contact reached parity. This reveals why D is less sensitive than S to whether the group differences in contact with the reference group are quantitatively large or small.

Winship (1977) notes it is peculiar that standard formulas for measures of uneven distribution such as D and S are calibrated to take values of zero (0) only under the condition of exact even distribution. There are multiple reasons why exact even distribution is a questionable baseline for integration. One is that in practice exact even distribution is often logically impossible to achieve, especially when counts of households for areal units are small.Footnote 10 Another is that exact even distribution is an unusual, precise pattern that would only be expected under structured residential assignments (e.g., quota allocation), not under random distribution. Relatedly, and more importantly for our purposes, the reference point of even distribution is conceptually different from the reference point adopted in most research on group disparities in socioeconomic outcomes. Even distribution is a stylized pattern of outcomes for spatial units, not individuals and households. The conventional approach for evaluating group disparities is to examine whether an observed group difference on average attainment outcomes for households and individuals differ from zero, the value expected under conditions of the null hypothesis that group membership and attainment outcomes are statistically independent. In keeping with this perspective, Winship (1977) argues an unbiased segregation index should have an expected value of zero under random distribution. But this is not the case for any widely used measure of uneven distribution. Instead, they all have positive expected values under random distribution and this quality of upward bias makes standard measures of uneven distribution flawed for measuring group disparity in the residential outcome of group composition.

Placing segregation index scores in a difference-of-means, or group disparities, framework brings an important benefit; it helps identify both the source of index bias and also the relatively simple refinement that can eliminate index bias. The core issue is this. In the group disparities framework, an index will be unbiased if the expected value of the group disparity on location-based attainments (yi) is zero (0) under random assignment. This is not the case for any index measuring uneven distribution; all take positive, sometimes large, expected values under random assignment. Fossett (2017) shows that index bias traces to a single source; the initial calculation of the value of a household’s simple (unmodified) contact with the reference group (pi). The problem is that the expected distribution of values on simple contact (pi) under random distribution is not the same for households in both groups; it is systematically higher for households in the reference group and it is systematically lower for households in the comparison group. Since simple contact (pi) is the “raw material” used in calculating values of scaled contact (yi), it logically follows that expected values for scores on scaled contact (yi) also must necessarily be higher for the reference group and lower for the comparison group, thus necessarily producing a positive expected value for the index score.Footnote 11

The core insight is that the standard approach to calculating the value of simple contact with the reference group (pi) for a given household has two components. The first component is the household’s contact with the reference group resulting from contact with neighbors (pni). Under basic sampling theory the expected distribution of values for this portion of contact is the same for households from both groups so it does not create bias in the index score.Footnote 12 The second component is contact with the reference group that results from self-contact (psi); that is, contact determined by the household’s own presence in the group counts for the area. Sampling theory is not relevant for the expected value of this portion of contact; it is fixed as same-race contact as determined by the group membership of the household in question. Accordingly, this portion of contact is systematically different for households in the two groups; it takes a positive value for households in the reference group and it takes a value of zero for households in the comparison group. The resulting positive difference across groups is the source of bias in measures of uneven distribution.

This can be stated more carefully as follows. First, start with the calculation of simple contact with the reference group for any given focal household (as noted earlier):

$$ {p}_i={n}_{1i}/\left({n}_{1i}+{n}_{2i}\right). $$

Then re-express the count terms in the calculation as counts for neighbors (subscript “n”) and self (subscript “s”) as follows:

$$ {p}_i=\left({n}_{1 ni}+{n}_{1 si}\right)/\left({n}_{1 ni}+{n}_{2 ni}+{n}_{1 si}+{n}_{2 si}\right). $$

The terms n1ni and n2ni are the counts of the reference group and the comparison group among the household’s neighbors. The terms n1si and n2si register the presence and group membership of the focal household; n1si is set to 1 if the household is in the reference group and 0 if not, and, similarly, n2si is set to 1 if the household is in the comparison group and 0 if not. Since the sum of the self-presence terms n1si and n2si is always 1, the expression can be simplified to:

$$ {p}_i=\left({n}_{1 ni}+{n}_{1 si}\right)/\left({n}_{1 ni}+{n}_{2 ni}+1\right). $$

For convenience, the denominator can be designated as nt – the number of households in the area. Then contact with the reference group (pi) can be expressed as the sum of contact with reference group neighbors (pni) and contact with the reference group resulting from self-contact (psi).

$$ {p}_i=\left({n}_{1 ni}+{n}_{1 si}\right)/{n}_t $$
$$ {p}_i={n}_{1 ni}/{n}_t+{n}_{1 si}/{n}_t $$
$$ {p}_i={p}_{ni}+{p}_{si} $$

Under random assignment, a focal household’s neighbors will be a random draw from the community so the expected distribution of values of pni will be identical for both groups and the expected mean for both groups will be equal to the relative presence of the reference group in the community (P). Consequently, this component of contact has no impact on the expected value of the group difference on contact with the reference group.

In contrast, the expected value of psi is 1/nt for households in the reference group and 0/nt for households in the comparison group, resulting in an expected group difference of 1/nt. The expected distribution of values of contact associated with neighbors (pni) is identical for households from both groups. The addition of self-contact (psi) systematically shifts the expected distribution of scores of simple contact (pi) up for all households in the reference group but has no impact on the expected distribution of scores for households in the comparison group. As a consequence, expected values on scaled contact (yi) scored from simple contact (pi) are systematically higher for households in the reference group and thus the expected group disparity (difference of means) is positive and biased upward.Footnote 13

2.5.2 The Simple Refinement to Index Calculations that Yields Unbiased Index Scores

Fossett’s (2017) analysis of segregation index bias in the group difference-of-means framework not only pinpoints the source of bias, it also establishes how the formulas for calculating scores for indices of uneven distribution can be refined to eliminate bias. The solution for eliminating index bias is to exclude the impact of self-contact from the simple contact calculations. That is accomplished by revising the standard contact calculation

$$ {p}_i=\left({n}_{1 ni}+{n}_{1 si}\right)/\left({n}_{1 ni}+{n}_{2 ni}+{n}_{1 si}+{n}_{2 si}\right). $$

by subtracting out the terms associated with self-contact. This leads to the following expression for unbiased contact with the reference group (\( {p}_i^{\prime } \)):

$$ {p}_i^{\prime }=\left({n}_{1 ni}-{n}_{1 si}\right)/\left({n}_{1 ni}+{n}_{2 ni}-{n}_{1 si}-{n}_{2 si}\right). $$

As noted earlier, the sum of the terms n1si and n2si is always 1, so the expression can be restated as:

$$ {p}_i^{\prime }=\left({n}_{1 ni}-{n}_{1 si}\right)/\left({n}_{1 ni}+{n}_{2 ni}-1\right). $$

In practice, segregation index scores are computed using datasets that provide counts of households by group for spatial units. This supports efficient calculation of index scores. In this context the calculation formula can be applied to area counts as follows:

$$ {p}_i^{\prime }=\left({n}_{1i}-1\right)/\left({n}_{1i}+{n}_{2i}-1\right)\ \mathrm{for}\ \mathrm{households}\ \mathrm{in}\ \mathrm{the}\ \mathrm{reference}\ \mathrm{group},\mathrm{and} $$
$$ {p}_i^{\prime }=\left({n}_{1i}-0\right)/\left({n}_{1i}+{n}_{2i}-1\right)\ \mathrm{for}\ \mathrm{households}\ \mathrm{in}\ \mathrm{the}\ \mathrm{comparison}\ \mathrm{group} $$

When this measure of unbiased contact is used as the raw input to the difference-of-means computing formulas for the most widely used measures of uneven distribution – G, D, R, H, and S- the resulting scores for these indices are unbiased; they take expected values of zero (0) when households are distributed randomly across residential locations (Fossett, 2017).

Fossett (2017) reviews further mathematical and empirical analyses to establish additional findings regarding the nature of index bias. In Table 2.1 we note five findings that apply to D and S, the measures of uneven distribution we use in our analyses. The last point is the most important one: One is never worse off for using unbiased index scores. If standard scores are not distorted by bias, the scores obtained using the unbiased version of the index will match the scores obtained using the standard version of the index. If standard scores are distorted by bias, they will be untrustworthy, and scores obtained using the unbiased version should be preferred.

Table 2.1 Selected Differences Between Standard (Biased) and Unbiased Scores for the Dissimilarity Index (D) and the Separation Index (S)

We will draw on these points in Table 2.1 and related factors to explain why findings obtained using the unbiased versions of D and S can, and in many situations do, differ from findings obtained using standard (biased) versions of D and S. For example, the results we obtain using the unbiased version of D indicate White-Latino segregation is stable or rising over time in new destination communities. In contrast, the results obtained using the standard (biased) version of D, suggest the opposite; they suggest White-Latino segregation is falling over time in new destination communities. The difference between results is due to the complex impact of index bias on scores obtained using standard computing formulas for D. By definition, Latino presence in new destination communities is initially low but it is rising over time. This means P – the relative presence of White households (the reference group) in the community – is initially close to its upper boundary of 1.0 but is falling and moving closer to 0.50 over time. All else equal, the value for the standard (biased) version of D will fall because the magnitude of bias in D will fall as P moves closer to 0.50. This is an important example of why it is important to use the unbiased versions of D and S to track trends in segregation and differences across communities.

2.6 Households as the Microunits for Measuring Segregation

As we have mentioned already, we assess segregation using data for households. This makes households, not persons, the microunits in the calculations we perform to obtain the unbiased index scores we use in our analysis. This is not the most common practice in recent decades; most studies take persons as the microunits for computing index scores. But we not only hold that using data for households is an acceptable choice; we also argue it is a superior choice. We outline the basis for our conclusion in this and following sections of this chapter. The case we make has two main points. In this section we note that data for households provides better coverage of the subpopulations that are relevant to the study of residential segregation (e.g., includes households and excludes inmates of prisons and other institutions). In later sections we note that using data for households makes it possible to obtain unbiased index scores when this is not possible using data for persons.

Calculating index scores using households as the microunits was once a routine practice in the literature. Indeed, many of the most influential studies of racial residential segregation conducted from the 1940s through the 1970s drew on reports from the U.S. Censuses of Housing that tabulated occupied housing units by race for census blocks (e.g., Taeuber & Taeuber, 1965; Roof, 1972; Roof and Van Valey, 1972; Sørensen et al., 1975). However, in more recent decades it became more common for comparative segregation studies to measure segregation using data for persons instead of data for households. One methodological reason for this was that tabulations for persons provided more detailed breakdowns on race/ethnicity. For example, the block-level tabulations in the housing censuses for 1940, 1950, and 1960 were limited to a distinction between White and nonwhite households. The crudity of this classification was a major problem at the time. But it is not a problem for our study. The 1990, 2000, and 2010 censuses all have tabulations of households by categories of race at the block level that include the group distinctions – White, Black, Latino, and Asian – considered in empirical studies of segregation across communities in the United States.Footnote 14

On the other hand, the data tabulations for households avoid multiple practical complications associated with the available data tabulations for persons. One complication is that data tabulations for persons often include several subpopulations that ideally would be excluded and whose presence can impact segregation index scores and distort their values.Footnote 15 Specifically, persons in group quarters in prisons, psychiatric institutions, military barracks, college dormitories, and other settings are often included in data tabulations for persons. These subpopulations can represent large fractions of one or both groups in the comparison and their spatial distribution is reflective of administrative practices and is not reflective of the social dynamics in the broader housing market. To protect against these problems, it is necessary to flag communities where these subpopulations are present and exclude cases where their impact on index scores is potentially important. This would lead to the exclusion of many dozens of cases in our study. Data for households do not include these subpopulations and thus they are not subject to these problems. Accordingly, we are able to retain most of the cases that would be excluded if we used tabulations for person data and achieve greater coverage of communities.

A second complication associated with using data for persons to measure segregation is that the index scores obtained are impacted by bias to a greater degree than is generally appreciated and it is very difficult to obtain unbiased index scores using data tabulations for persons. This is the major reason why we use data for households; it is relatively easy to obtain unbiased index scores when using data for households and it is not possible to obtain unbiased index scores for all study years when using data for persons. This point is significant and rests on insights and observations that are new to the present study. In view of this, we provide a more detailed discussion of the issues in the next section.

2.6.1 Methodological Implications of Using Data for Households Versus Using Data for Persons

It is possible that studies in recent decades have used data for persons more often than data for households in part because some researchers may view index bias as a greater concern when using data for households. The potential justification for this view is that, all else equal, index bias is higher when group counts across spatial units are smaller and counts for households tend to be much smaller than counts for persons. However, any view that using data for persons instead of households provides useful protection from the distorting impact of index bias is misplaced on two important counts. The first is that, when considered carefully, the impact of index bias on segregation scores obtained using data for persons is inherently similar in magnitude to the impact of index bias on scores obtained using data for households. The second is that the available methods for obtaining unbiased index scores are easier to apply when using data for households and they are either difficult or infeasible to apply when using data for persons. Accordingly, measuring segregation using data for households is not only an acceptable practice, it is a clearly superior choice for the needs of our study.

Winship (1977) established index bias was greater in magnitude and concern when group counts in spatial units were small. The early literature examining segregation using block level data for households was thus open to legitimate concerns about index bias because the group counts for spatial units were indeed small. Following Winship’s (1977) influential study, research practice shifted toward using the much larger spatial units of census tracts and using data for persons instead of households. As a result, the group counts for person data at the census tract level were much larger than group counts for households at the block level and thus could be seen as providing protection against the problem of index bias. This view was partly justified as, all else equal, the magnitude of index bias is smaller when segregation is measured using census tracts instead of census blocks. This is not an acceptable option for our study because we wish to assess segregation for smaller metropolitan areas and nonmetropolitan communities where it is necessary to use block-level data to accurately capture segregation patterns. This is not a concern for us, however, because the unbiased scores we use in our study can be calculated for blocks as easily as for tracts and the unbiased scores have the same desirable properties whether calculated using data for blocks or data for tracts.

To the extent that researchers believed empirical studies gained protection from index bias by using data for persons instead of households, they were mistaken. All else equal, data for persons do involve larger group counts than data for households. But these larger counts do not provide protection from index bias. To the contrary, the magnitude of the impact of index bias for scores computed using data for persons is similar, if not identical, to that for scores computed using data for households. Two factors account for this. One is that persons within households locate together in household-specific clusters of persons, not independently. The other is that persons within households are typically homogeneous on racial/ethnic status. These two factors combine to create fixed proportions of same-group contact – the source of bias in index scores – that are similar in magnitude whether scores are calculated using data for households or data for persons.

Earlier we noted index bias originates in the contribution of self-contact (psi) to simple contact (pi) because a household’s self-contact is fixed and cannot, not even in principle, be randomly assigned and varies systematically by group membership. The situation regarding fixed contact that varies by group membership is fundamentally the same when considering data for persons instead of data for households. To illustrate, we consider the situation where all blocks have 10 households. The contact experienced by an individual household is comprised of contact with their nine neighboring households and the household’s self-contact. Contact with neighbors can in principle be a random draw, in which case the contribution to expected contact with the reference group will follow the relative presence of the reference group in the community (P) based on (9/10)P. This expected value is the same for households from both groups, so the expected group difference on contact with the reference group from neighbors is zero (0).

In contrast, the contribution of self-contact is fixed and it varies systematically by race. So self-contact will increase contact with the reference group (p) by 1/10 for households from the reference group and will have no impact for households in the comparison group. Thus, the expected group difference in contact with the reference group resulting from self-contact is ten percent (10%) of total contact. This is based on taking the difference between the group-specific levels of expected contact with the reference group under random assignment as follows:

$$ {\displaystyle \begin{array}{c}E{\left[p\right]}_1=\left(\left({n}_t-1\right)/{n}_t\right)\ P+1/{n}_t=\left(9/10\right)\ P\\ {}\kern0ex \hspace{2.8em}+1/10\ \mathrm{for}\ \mathrm{households}\ \mathrm{in}\ \mathrm{the}\ \mathrm{reference}\ \mathrm{group}\end{array}} $$
$$ {\displaystyle \begin{array}{c}E{\left[p\right]}_2=\left(\left({n}_t-1\right)/{n}_t\right)\ P+0/{n}_t=\left(9/10\right)\ P\\ {}\kern0ex \hspace{2.8em}+0/10\ \mathrm{for}\ \mathrm{households}\ \mathrm{in}\ \mathrm{the}\ \mathrm{comparison}\ \mathrm{group}\end{array}} $$

where nt is the number of households on a block (10 in this example). The expected difference (E[p]1 − E[p]2) will be =1/nt, which in this example is 1/10. This difference in expected contact with the reference group resulting from the contribution of fixed same-group contact is the sole source of index bias (Fossett, 2017).

The basic character of this situation does not change when a researcher switches from using data for households to using data for persons. To illustrate why this is so we modify the example just considered by assuming the households in question all have four persons and thus the 10-household blocks all have 40 persons. We also assume persons within households are of the same race and locate together. From the point of view of an individual person, 36 of their neighbors (based on 9 households, each with 4 persons) can in principle be a random draw so also as before the expected contribution to contact with the reference group will follow the relative presence of the reference group in the community (P) and can be given as (36/40) P which reduces to (9/10) P, the same value just identified for households. As before, this expected value applies to persons from both groups, so the expected group difference in contact with the reference group from neighbors is zero (0).

If self-contact and contact with fellow household members were a random draw, its contribution to expected contact would be (4/40) P and the expected group difference would be zero (0). But both self-contact and contact with fellow household members cannot be a random draw; they are fixed same-group contact that varies systematically by race. So, the contribution of these sources of contact with the reference group (p) is 4/40 for persons from the reference group and zero (0) for persons from the comparison group. Thus, the expected group difference in contact with the reference group for persons is 4/40 = 0.10, the same value seen earlier for households. To summarize, the difference between the group-specific levels of expected contact with the reference group under random assignment is given as follows:

$$ {\displaystyle \begin{array}{c}E{\left[p\right]}_1=\left(\left({n}_t-{n}_h\right)/{n}_t\right)\ P+{n}_h/{n}_t=\left(36/40\right)\ P\\ {}\kern0ex \hspace{2.8em}+4/40\ \mathrm{for}\ \mathrm{persons}\ \mathrm{in}\ \mathrm{the}\ \mathrm{reference}\ \mathrm{group}\end{array}} $$
$$ {\displaystyle \begin{array}{c}E{\left[p\right]}_2=\left(\left({n}_t-{n}_h\right)/{n}_t\right)\ P+0/{n}_t=\left(36/40\right)\ P\\ {}\kern0ex \hspace{2.8em}+0/40\ \mathrm{for}\ \mathrm{persons}\ \mathrm{in}\ \mathrm{the}\ \mathrm{comparison}\ \mathrm{group}\end{array}} $$

where nt is the number of persons on a block (40 in this example) and nh is the number of persons in a household (4 in this example). Accordingly, the expected difference (E[p]1 − E[p]2) will be =nh/nt which in this example is 4/40 = 1/10, the same as for households.

2.6.2 Difficulty of Correcting Index Bias When Using Data for Persons

Our conclusion that the expected impact of bias on segregation index scores is fundamentally the same regardless of whether segregation is measured using data for households or data for persons has broader implications than may not be apparent on first consideration. We note three important implications here:

  • The impact of bias on index scores calculated using data for persons is much greater than most researchers are likely to appreciate because it is greater than would be indicated by estimating index bias using either the analytic formulas introduced by Winship (1977) or the bootstrap simulation methods from Carrington and Troske (1997) in combination with data for persons.

  • It is much more difficult to obtain unbiased index scores when using data for persons instead of data for households.

  • In general, it is easier to obtain unbiased index scores when using data for households and, at least for U.S. communities, the results will be similar to results obtained using data for persons.

On the first point, Winship’s (1977) analytic formulas for estimating index bias are explicitly formulated to be applied to households. The formulas are based on probability models that assume the locational outcomes for micro-level units are independent events. This assumption is reasonable for households. But it is not reasonable for persons. Most persons reside in racially homogeneous households and the locational outcomes for persons within households are strongly correlated. Spouses, partners, children, etc. do not locate independently; they locate in “clusters” and the clusters are homogeneous on race. This combination produces levels of bias in index scores that are much higher than would result if individuals within households located independently. A rough-and-ready rule of thumb is that, when compared to the level of bias estimated using data for persons in combination with an incorrect assumption that persons in households locate independently, the true level of bias will be higher by a multiple equal to the average size of households.Footnote 16

In view of this, the analytic formulas outlined in Winship (1977) cannot be naively applied to data for persons. To be appropriate for use with data for persons, Winship’s formulas must be modified to take account of the fact that persons locate in racially homogeneous household clusters. The same conclusion also applies to Carrington and Troske’s (1997) method of estimating index bias using bootstrap sampling methods. Their methodology is explicitly crafted for use with data for persons, but the research context is measuring occupational sex segregation, a context where it is reasonable to assume a model of independence of events across persons. Their bootstrap method for estimating expected values for measures of residential segregation is appropriate to use with data for households but it is not appropriate to use with data for persons. To use the method with data for persons, the bootstrap simulation procedure must be modified to assign household-specific clusters of persons randomly to locations. We conducted a methodological study where we implemented this approach using detailed tabulations of race by size of household. The findings we obtained were simple and clear. Random assignment of persons in racially homogeneous household-specific clusters produces much higher expected values for index scores than random assignment of persons. And, the expected values for index scores obtained using complex methods needed for using data for persons were comparable to the expected values for index scores obtained using much simpler methods that can be used with data for households.

With this, the importance of the second highlighted implication comes into clear relief. There are no established methods for obtaining unbiased index scores when using simple (one-way) tabulations of persons by race for spatial units. We specify simple, one-way tabulations of persons by race because this is the kind of data used in most empirical studies of residential segregation. Additionally, while it is technically possible to obtain unbiased index scores when using data for persons, it requires that researchers work with detailed tabulations of persons by race and size of household. In many situations the requisite data are not available. Moreover, when the needed data are available there is little practical justification for going to the considerable extra time and effort because the results will be similar to results obtained by applying simpler methods to data for households.

Size of household varies considerably across households in general. Additionally, the central tendency and dispersion of the distribution of households by size varies over time, across racial/ethnic groups, across residential areas within communities, across metropolitan and nonmetropolitan settings, and more. Simple approaches to taking account of clustering of persons within households while working with data for persons – for example, dividing by average household size to convert person data to approximate household data – do not work well and are inferior to calculating unbiased index scores using simple tabulations of race by spatial units for households. This leads us to a simple and important conclusion. When data for households are available, unbiased index scores computed using these data are superior to all other feasible and practical options.

2.7 Contrasting the Dissimilarity Index and the Separation Index for Measuring Segregation

We measure segregation using two indices, the dissimilarity index (D) and the separation index (S). Both indices are well-established and have been used extensively since the earliest days of quantitative measurement of segregation. The dissimilarity index is the most commonly used measure of uneven distribution. It is popular in part because it is easy to compute.Footnote 17 Additionally, D has a substantive interpretation many researchers view as both appealing and easy to convey to nontechnical audiences. Namely, the value of D indicates the minimum proportion of households from either group in the comparison that would have to relocate to a different area to achieve even distribution – a pattern where the ethnic composition of every area matches the ethnic composition of the community overall. We also note a disparity formulation of D has an appealing interpretation as well and is easy to convey to broad audiences. Under this formulation, the value of D is the group difference in the percentage of households that reside in parity areas; areas where the relative presence of the reference group (p) equals or exceeds the level seen for the community overall (P).

While D is a workhorse in empirical segregation studies, it has well-known technical deficiencies documented in methodological studies (Zoloth, 1976; Winship, 1977; James & Taeuber, 1985; White, 1986; Reardon & Firebaugh, 2002; Fossett, 2017). In particular, D is known for being sensitive to how groups are differentially distributed across below-parity areas and parity areas, distinguished by whether p ≥ P, but insensitive to how groups are differentially distributed across areas on the same side of the parity line (White, 1986: 203; James & Taeuber, 1985: 13; Reardon & Firebaugh, 2002: 51; Fossett, 2017). Thus, it is fair to describe D as sensitive to group differences in the fraction of households displaced into below-parity areas but insensitive to whether non-parity areas are close to parity or far from parity. Alternatively, D is sensitive to group inequality in ordinal distribution across areas classified as below-parity, parity, and above-parity but insensitive to group inequality on quantitative outcomes on area group composition. Another characteristic of the dissimilarity index is particularly relevant for our study; D is especially prone to upward bias and the problem is acute when assessing segregation comparisons involving small populations and small spatial units (Winship, 1977; Carrington & Troske, 1997; Fossett, 2017). We used Fossett’s (2017) refined formulation of D discussed earlier to overcome this problem.

Like D, the separation index (S) has been used extensively from the earliest days of quantitative research on residential segregation. But this fact is underappreciated because S has been used under a variety of different names over many decades including, in rough chronological order, the revised index of isolation (Bell, 1954), the correlation ratio or eta squared index (Duncan & Duncan, 1955; Stearns & Logan, 1986; White, 1986), the segregation index (Zoloth, 1976), the Coleman rij index (Coleman et al., 1975, 1982), the variance ratio index (James & Taeuber, 1985), the normalized exposure index (Reardon & Firebaugh, 2002), and the separation index (Fossett, 2017). The separation index consistently fares well in methodological studies (e.g., James & Taeuber, 1985; White, 1986; Reardon & Firebaugh, 2002; Fossett, 2017). For example, in contrast to D, S registers group differences in distribution across non-parity areas and thus, is sensitive not only to the extent to which groups differentially reside in non-parity areas but also to whether they reside in areas that are near or far from parity.Footnote 18 Additionally, in comparison to D, S is much less susceptible to distortion by index bias (Winship, 1977; Fossett, 2017).

The separation index has multiple interpretations. Unfortunately, interpretations of S offered in earlier decades often highlighted the measure’s correspondence with terms from analysis of variance and correlation.Footnote 19 These interpretations are, of course, technically correct. But they are not particularly effective in evoking the substantive relevance of segregation in a way that is intuitive and easy to convey to broader audiences. We favor a more appealing interpretation of S as a disparity measure; for example, in the context of majority-minority segregation, S reflects the majority-minority difference in average contact with the majority. This interpretation emerges naturally from the difference of means computing formula for S reviewed earlier.Footnote 20 It is attractive because it is easy to explain. If majority and minoritized group households live together, the contact households from both groups have with majority households will not differ. But, if majority households live apart from minoritized group households, the contact difference will grow and take the value 1.0 in the extreme where the two groups never reside in the same area.

2.7.1 Segregation as Stratification and the Resonance of the Separation Index

Fossett (2017) advocates referring to S as the “separation index” because, among all measures of uneven distribution, values of S provide the most reliable signal of the extent to which groups are separated in their distribution across spatial units such that substantial portions of both groups live apart from each other in different spatial units that are highly polarized (homogeneous) on group composition. In this regard, we argue S resonates nicely with a definition of segregation offered by Massey and Denton (1988), which we recall again here:

[R]esidential segregation is the degree to which two or more groups live separately from one another, in different parts of the urban environment. (Massey & Denton, 1988, emphasis added)

We endorse and highlight this definition of segregation for two reasons. The first is that this definition implies a logical connection of segregation and group inequality on location-based stratification outcomes. Specifically, when groups are residentially separated in the sense of living apart from one another in different spatial units or spatial domains, it becomes logically possible for the groups to have unequal social, economic, and health-related outcomes that are linked to area of residence. Additionally, if groups are not separated across spatial units, inequality on stratification outcomes linked to spatial location is not logically possible. Thus, separation is a necessary but not sufficient condition for segregation to have implications for group stratification on social, economic, and health attainments.

We believe most studies of residential segregation are like our study in being motivated by a fundamental assumption that segregation has important consequences for group inequality in life chances and a wide range of stratification outcomes. When this is true, the separation index is clearly superior to the dissimilarity index in ability to signal whether the pattern of segregation in a community creates the logical preconditions for inequality on location-based outcomes (Fossett, 2017). The separation index can take high values under only one condition, when groups are separated into different spatial units that are polarized on group composition, as occurs when the minoritized group is concentrated in enclaves, barrios, or ghettos (Stearns & Logan, 1986; Fossett, 2017). Thus, high values of S provide a strong, reliable signal that groups are separated across spatial units such that both groups live in neighborhoods where their group predominates and the other group is largely absent. Speaking of White-Black segregation, Stearns and Logan view this to be substantively important arguing, “The fact that some neighborhoods reach very high concentrations of black population has profound economic and political consequences for those neighborhoods” and noting that Black neighborhoods are subject to redlining, business disinvestment, siting of unwelcome developments (prisons, halfway houses, low income housing, garbage compacting sites, etc.) and avoidance of desirable developments (libraries, parks, universities, etc.) (Stearns & Logan, 1986:127–128).

Importantly, D does not provide a reliable signal regarding group separation across different spatial units. To the contrary, it is both logically possible and empirically common for D to take high values when groups are not separated and instead both groups live together in neighborhoods that are relatively close to parity (Fossett, 2017). The possibility for D to take high scores in the absence of group separation is due to the fact that D only registers group differences in distribution across the parity line and is insensitive to group differences in distribution across spatial units on the same side of the parity line. This is a serious flaw for measuring group separation because group differences in distribution across below-parity areas and/or across above-parity areas can vary widely with major consequences for group separation, creating the possibility for D and S to take different combinations of values which have different substantive implications.

2.7.2 Making Sense of D-S Combinations

We follow recommendations offered by Fossett (2017) and examine values of both D and S. Reporting and reviewing values of D helps maintain continuity with previous studies which often report only scores for D. It also is useful because different combinations of scores for D and S provide a basis for characterizing the pattern of segregation in a community for a given group comparison:

  • When S takes high values, one can be certain that groups are separated in space in a pattern of prototypical segregation, or polarized unevenness, which establishes the logical preconditions for group inequality on location-based stratification outcomes. Since values of D cannot be lower than values of S, a high value on D will always occur when the value of S is high.

  • When D takes high values, one cannot know with certainty whether groups are separated in space. It is a logical possibility and if this is the case S will also take a high value. But it also is logically possible and empirically common for a high value of D to result from a pattern of dispersed displacement from even distribution, or dispersed unevenness, which does not involve high levels of group separation and thus will occur in combination with a low value on S.

The potential for D and S to take different, potentially highly discrepant, values is not widely appreciated. As a result, researchers and broader audiences alike routinely, but incorrectly, assume that high values of D are a strong signal of an underlying pattern of segregation involving group separation and possibly substantial group inequality on location-based outcomes. This is not the case.

We use the chart in Table 2.2 to lay out the various logical possibilities for alignment of values of D and S in combination with possibilities for group inequality on location-based stratification outcomes.Footnote 21 The chart makes clear that the pattern of dispersed unevenness – characterized empirically by the combination of a high value for D and a low value for S, does not carry even the possibility of group inequality on location-based outcomes. In contrast, a high value on S does carry the logical possibility of group inequality on location-based outcomes, but only by signaling the necessary precondition of group separation across spatial units. If undesirable location-based outcomes are limited to areas where the minoritized group is the predominant presence and favorable location-based outcomes are limited to areas where the majority is the predominant presence, a pattern often approximated in communities across the United States, separation will in fact be associated with group inequality. But it is logically possible that separation can be high without inequality being high. This would result, for example, if all low-income households experienced similar undesirable location-based outcomes, and all high-income households experienced similar favorable location-based outcomes, but majority and minoritized group households lived apart in different spatial units. In this hypothetical example, inequality in location-based stratification outcomes is a strict function of income, not race. It is a logical possibility, but it is not widely observed in communities across the United States.

Table 2.2 Logically possible outcomes on dissimilarity (D), separation (S), and group inequality on location-based stratification outcomes

We conclude this section by noting that we view the separation index to be the best available measure for identifying when segregation patterns in a community involve groups living separate and apart from each other. If the value of S is low, we know that group inequality on stratification outcomes linked to residential location cannot be large because the two groups in the comparison are, to a substantial degree, residing in the same areas and necessarily experiencing similar location-based outcomes. If the value of S is high, we know the two groups are living apart and thus can potentially have unequal outcomes on attainments tied to residential location. We also occasionally report and discuss values of the dissimilarity index because it is familiar and widely used. So, we believe it provides a useful point of reference for readers who prefer D over S for their own reasons as well as to facilitate comparisons with other research that reports only scores for D. When D and S agree, interpretations and conclusions are relatively simple. When D and S do not agree, it will result because D is taking a high value under conditions of dispersed unevenness where most households reside in areas that are relatively close to parity. The possibility of this condition is not widely appreciated. Perhaps for that reason, the literature provides little basis for assigning substantive significance to a high-D low-S situation.Footnote 22

2.7.3 Examining Empirical Examples of Selected D-S Combinations

Take the example of how groups are distributed across below-parity areas in the common situation of measuring majority-minority segregation in a community where the minoritized group is the smaller population. Separation will be maximized when minoritized group households are concentrated in below-parity areas that are homogeneous or near-homogeneous and separation will be minimized when all minoritized group households reside in below-parity areas that are as close to parity as possible. The first outcome occurs under a pattern Fossett (2017) terms “prototypical” segregation. The adjective “prototypical” is apt because it suggests the notion that comes to the minds of researchers and lay audiences when a community is characterized as being highly segregated. If S takes a high value, one can be certain segregation takes the form of prototypical segregation, which we have also referred to as polarized unevenness. The same cannot be said for D. Instead, while D will indeed be high whenever S is high, the value of D is not a reliable signal of separation and polarized unevenness because D also can take high values under the pattern of dispersed unevenness. Dispersed unevenness does not involve separation. Yet D, but not S, can take high values under this condition.

The contrast can be clarified using graphical representations of the residential patterns for a few communities. The first example we review is White-Black segregation in Chicago, IL in 1990 in Fig. 2.1. The case of Chicago has for nearly a century been seen as a distinctive pattern of polarized unevenness because it involves the two groups in question living in separate parts of the urban environment, here with a large proportion of Black households residing in predominantly Black neighborhoods located on the South side of Chicago and with a large proportion of White households residing in predominantly White neighborhoods on the North side and in the suburbs surrounding Chicago. The high level of separation is reflected by the high value of 79.7 for S and its close correspondence to the high value of 87.0 for D. The polarization chart in Fig. 2.1 visually depicts the extent of group separation by plotting each group’s distribution across areas by level of percent White. The two frequency polygons form a distinctive combination of a left-peaked “L” curve for Black households registering the fact that more than 70% of Black households live in areas that are at least 90% Black (less than 10% White) and a mirror image right-peaked “J” curve registering the fact that more 85% of White households live in areas that are at least 90% White.

Fig. 2.1
2 graphs of polarization and neighborhood grid chart. A, A dual-line graph of the relative frequency versus proportion white. Data is for whites and blacks. White has an increasing trend. B, A grid chart of the north-south versus west-east grid location, with plots clustered in the top right corner.

Group distributions in Chicago, IL, 1990

Under even distribution the polarization chart would form a “JL” pattern, not an “LJ” pattern, with both curves peaking at 80 on the X-axis, the percentage of White households in the combined population, and falling away from there. The thick dashed vertical lines mark the respective group means for contact with White households at 95.6 for White households and 16.2 for Black households with the difference determining the value of 79.7 for the separation index. The polarization chart is so named because it clarifies that when values of S are high one can be certain that a large fraction of White households are residing in predominantly White neighborhoods and also a large fraction of Black households are residing in predominantly Black neighborhoods.

The neighborhood grid chart in Fig. 2.1 depicts this pattern using an alternative visualization approach. Specifically, the distributions of White households and Black households across areas is projected onto a stylized overhead view representation of the city as consisting of 400 neighborhoods arranged in a 20 × 20 neighborhood grid where each neighborhood has 25 households. This creates an abstract representation of a city with a total of 10,000 households. The chart is constructed by ordering the actual areas of the city on the basis of relative group presence – proportion White in this case – and then projecting the areas on to the neighborhood grid on a proportional basis. Homogenous White areas are filled in starting in the Southwest corner of the grid, then areas with mixed group presence are filled in in the middle portion of the grid, working from areas with greater White presence first and then areas with greater Black presence, and then finishing by filling in homogeneous Black areas in the Northeast corner.

A cross-diagonal line falling from left to right is superimposed on the grid to provide a visual reference for how the city would be divided into homogeneous White and Black regions under maximum segregation.Footnote 23 In Chicago, the housing grid chart depicts a striking visual pattern. The overwhelming majority of areas on either side of the diagonal line are either 100% White (on the southwest side of the diagonal line) or 100% Black (on the northeast side of the diagonal line). From this, it obvious that it is possible for White and Black households to have fundamentally different experiences on stratification outcomes that are tied to residential location including, for example, city services, schools, amenities, infrastructure, mortgage loan redlining, and more.

These patterns are not surprising because White-Black segregation in Chicago is perhaps the most widely known and studied empirical case in the broader literature on residential segregation. But is it representative? And, relatedly, is a high value of D such as the one observed in Chicago always a reliable signal that a clear pattern of group separation is present as seen in Chicago? We have already tipped our hand regarding the answers to these questions. While Chicago is an important case, it is likely not representative because high values of D do not necessarily involve group separation as seen in Chicago.

We justify these conclusions by reviewing data for White-Black segregation in 1990 in four additional cities: Tampa-St. Petersburg, FL (Fig. 2.2); Lubbock, TX (Fig. 2.3); Topeka, KS (Fig. 2.4); and Erie, PA (Fig. 2.5). The results for Tampa-St. Petersburg and Lubbock are similar to the results seen for Chicago. Both cities have high levels of polarized unevenness, albeit not quite as high as in Chicago, with values of D and S being relatively close as is characteristic of the pattern of polarized unevenness; D and S are 80.6 and 61.9, respectively, in Tampa-St. Petersburg and 74.3 and 58.0, respectively, in Lubbock.Footnote 24

Fig. 2.2
2 graphs. A, A dual-line graph of the relative frequency versus proportion white. The lines are whites and blacks. The line white follows an increasing trend. B, A grid chart of the north-south versus west-east grid location. The plots are clustered in the top right corner.

Group distributions in Tampa-St. Petersburg, FL, 1990

Fig. 2.3
2 graphs. A, A dual-line graph of the relative frequency versus proportion white. The lines are whites and blacks. White follows an increasing trend. B, A grid chart of the north-south versus west-east grid location. The plots are clustered in the top right corner.

Group distributions in Lubbock, TX, 1990

Fig. 2.4
2 graphs. A, A dual-line graph of the relative frequency versus proportion white. The lines are whites and blacks. Both lines follow an increasing trend. B, A grid chart of the north-south versus west-east grid location. The plots are scattered from the top right corner towards the center.

Group distributions in Topeka, KS, 1990

Fig. 2.5
2 graphs. A, A dual-line graph of the relative frequency versus proportion white. The lines are whites and blacks. Both lines follow an increasing trend. B, A grid chart of the north-south versus west-east grid location. The plots are scattered from the top right corner towards the center.

Group distributions in Erie, PA, 1990

In both cities the polarization chart has the “LJ” pattern associated with group separation and polarized unevenness, and, correspondingly, the neighborhood grid charts for the two cities depict clear separation of groups with a majority of areas on the southwest side of the diagonal being homogeneously White and the majority of areas on the northeast side of the diagonal being homogeneously Black. These visualizations of the segregation pattern make it clear that the separation of groups into different areas in the city creates the logical possibility for the groups to experience inequality in stratification outcomes linked to residential location.

The results for Topeka and Erie document a pattern of uneven distribution that is fundamentally different from polarized unevenness. Instead, they follow the pattern of dispersed unevenness. The hallmark of this pattern is that groups differ substantially on the outcome of residing in below-parity areas but at the same time they generally reside together in neighborhoods that are relatively close to parity and do not reside apart in areas that are polarized on group composition. A primary feature of the pattern is that values of D are high, as high as seen in Tampa-St. Petersburg and Lubbock, but values of S are markedly lower. The value of D for Erie is 75.2, higher than the value of D for Lubbock, but the value of S is 36.9, not even half of the value of D for Erie and some 21.1 points lower than the value of 58.0 for S for Lubbock. Our quantitative guideline for polarized unevenness is not met and our quantitative guideline for identifying a pattern of dispersed unevenness is met.Footnote 25

The polarization charts for Topeka and Erie depart dramatically from the corresponding charts for Tampa-St. Petersburg and Lubbock. In contrast to the first two cities, Topeka and Erie do not have the “LJ” pattern associated with group separation. The reason for this is that, while the Black populations in these cities do generally reside in below-parity areas (i.e., areas where p < P), the Black populations in these cities are not concentrated in predominantly Black areas, a necessary condition to form the “L” in the “LJ” pattern characteristic of group separation. Instead, the Black populations in these cities are dispersed across a wide range of neighborhoods with a clear majority living in majority White areas and with the two most common (i.e., modal) neighborhood results being areas that are 80–89% White and 90–100% White!

The pattern of dispersed unevenness is reflected in three obvious ways in the neighborhood grid charts for these two cities. The first is that, in vivid contrast to the same charts for Tampa-St. Petersburg and Lubbock, only a few of the areas above the diagonal are 100% or near-100% Black. Second, many of the areas above the diagonal are majority White! And, lastly, as a byproduct of the first two points, a large fraction of the Black population is dispersed across predominantly White areas in the region below the diagonal. In brief, few Black households live apart from White households in areas that are predominantly Black, and most Black households co-reside with White households in majority White areas that, while technically below parity, are relatively close to parity.

This pattern for group residential distributions has great substantive significance. When segregation takes this pattern, the potential consequences of segregation for group inequality on stratification outcomes linked to neighborhood location are blunted. White-Black differences in exposure to substandard city infrastructure, unfavorable treatment in mortgage lending, poor public services, food deserts, noxious odors and noise from industrial sites, hazardous wastes and emissions, and so on, cannot be large as under the pattern of polarized unevenness because most Black households are living in neighborhoods where more White than Black households are experiencing the same location-based outcomes. So, if one’s interest in segregation is for its relevance for social inequality, the low values of the separation index are directly informative and readily distinguish between cities with polarized unevenness such as Chicago, Tampa-St. Petersburg, and Lubbock and cities with dispersed unevenness such as Topeka and Erie. In contrast, the high values of the dissimilarity index are unreliable and often misleading. The values of D are not low in any of these cities and, for example, D is higher in Erie (75.2) than in Lubbock (74.3), even though the segregation pattern in Lubbock is clearly fundamentally different from the pattern in Erie. The separation index readily distinguishes between the two patterns where the dissimilarity index utterly fails.

We chose examples featuring White-Black segregation in metropolitan areas in 1990 where P varies in a narrow range (92.0–95.6) to keep the differences between the cities considered to a minimum. But even within this relatively narrow scope, we are able to provide compelling examples of how D and S provide different insights into residential segregation. The separation index is clearly superior with regard to being able to identify patterns of segregation that contain the precondition of group separation across areas that creates the logically possibility for group differences in stratification on life chances and opportunities based on residential segregation.

We conclude this section by noting that the differences between D and S take on great importance in the findings we report in our analysis chapters. For example, we will see that the pattern of polarized unevenness, the pattern most closely linked to segregation and racial stratification, is much more common for White-Black segregation than for White-Latino segregation and White-Asian segregation, with the pattern of polarized unevenness in fact being quite rare for White-Asian segregation. We also show that White-Latino segregation generally follows the pattern of polarized unevenness in areas of established Latino presence but takes the form of dispersed unevenness in areas of limited Latino presence. Relatedly, we show that in general White-Latino segregation in new destination communities initially takes the form of dispersed displacement in the early stages of Latino settlement but as the Latino population becomes more established, segregation shifts toward the form of polarized unevenness. This leads to the seeming paradox where values of D are falling over time in Latino new destination communities while values of S are rising. The examples we review here illustrate how this can happen. Values of D are uninformative about group separation; they register the group difference in relative distribution across below-parity areas and parity areas but, since D is insensitive to how groups are distributed across below-parity areas, a high value on D can readily reflect either low or high separation of groups. Consequently, it is not only logically possible for group separation as measured by S to be rising while group differences in residing in below-parity areas are stable or falling, it is in fact a typical pattern for White-Latino segregation in new destination communities.

2.7.4 Dissimilarity, Separation, and Isolation Indices

Empirical studies of residential segregation often rely solely on the dissimilarity index to measure uneven distribution. But many studies also consider the “p-star” (P*) isolation index (I) as a supplement to D. As a general practice we view this as perfectly fine because the isolation index provides potentially interesting information about the level of same-group contact a particular group experiences. At the same time, however, we also stress that this does not change the value and need to examine the separation index if one is interested in group separation. While the separation index and the isolation index both can be formulated in terms of same-group contact calculations, S is conceptually and mathematically distinct from I in two fundamental ways. First, following the standard practice adopted when measuring uneven distribution, contact terms relevant for the separation index are calculated using just the counts for the two groups in the comparison while contact terms relevant for the isolation index are calculated using counts for all groups in the population. This can lead the respective contact terms to take very different values depending on whether groups other than the two in the segregation comparison are present in the community. Second, even if the isolation index is computed using just the pairwise group counts, there is no necessary relationship between the scores for I and S. The isolation index registers the level of same-group contact, which is determined by the relative size of the two groups. This sets the “floor,” or minimum value, for same-group contact and uneven distribution, which can raise the value of same-group contact to 1.0 under complete segregation. In contrast, S registers same-group contact in relation to its expected value and has no necessary relationship with relative group size. Thus, in the absence of uneven distribution, the expected value of S will be zero (0) regardless of the relative size of the groups. The value of S rises above zero only when (pairwise) same-group contact is higher than would be expected under integration and the value S takes will be a function of the degree to which groups reside in different areas and is not inherently related to relative group size. Accordingly, values of S can range from 0.0 to 1.0 at any given value of I. Thus, while values of the isolation index can be interesting in their own right, knowledge of I does not provide a reliable signal on group separation. The separation index is a direct reliable measure of group separation, while D and I are not.

2.7.5 Further Comments to Guide Interpretations of Values for Dissimilarity and Separation Indices

To this point we have only discussed D-S combinations that have distinctively different substantive implications. Here we provide guidance on how the wide range of possible intermediate D-S combinations can be characterized and interpreted. To begin, we note that it is always logically possible for scores of D and S to take the same values; this occurs under a condition of maximum polarization of non-parity areas on group composition, which is realized when all non-parity areas are homogeneous (Fossett, 2017).Footnote 26 This corresponds to the pattern of polarized unevenness where displacement of group distributions from uneven distribution maximizes group separation. The value of S will be lower than the value of D if any non-parity areas are not fully polarized (i.e., are not homogeneous). In empirical residential distributions, many non-parity areas will be less than homogeneous, even in high segregation situations like White-Black segregation in Chicago, due to a variety of idiosyncratic factors such as mixed areas that occur along the boundaries of transition between homogeneous portions of urban space. So, it is appropriate to characterize D-S combinations where S is lower than D as polarized unevenness – wherein displacement from even distribution involves a high degree of group separation – so long as values of S are fairly close to values of D.

On the other end of the continuum, the pattern of dispersed unevenness – uneven distribution without group separation – always involves a large D-S difference. In general, this occurs when no non-parity areas are homogeneous and instead most areas are quantitatively relatively close to parity. Fossett (2017) provides a technical review of the logical possibilities. The point most relevant for the present discussion is that when D is at a medium or higher level (e.g., D ≥ 40) the pattern of maximal or near-maximal dispersed unevenness will produce a large D-S difference and the difference can be very large when the groups are imbalanced in size.Footnote 27 The potential for D-S divergence flows from the fact that they measure different aspects of uneven distribution. The separation index registers group separation which is greater when non-parity areas are quantitatively further from parity and polarized on group composition. In contrast, D registers group displacement into non-parity areas without regard for whether the areas are polarized or relatively near parity. Thus, a larger D-S difference is a sign that uneven distribution involves dispersed unevenness into near parity areas and low group separation while a smaller D-S difference is a sign of prototypical segregation wherein uneven distribution involves group separation into non-parity areas that are polarized on group composition.

In empirical analysis, residential distributions in even the most highly segregated communities typically include some non-parity areas with intermediate (non-homogeneous) group composition. Consequently, values of S in empirical studies approach, but do not equal values of D even in exemplars of extreme segregation such as White-Black segregation in Chicago, Cleveland, Detroit, and Milwaukee, which come the closest to embodying a S ≈ D combination characteristic of polarized unevenness. Our analysis of the empirical relationship of D and S over thousands of group comparisons suggests polarized unevenness follows the pattern D = S2/3, or, alternatively, S = D3/2, as reflected, for example, in the combinations of D = 0.90 and S = 0.85, D = 0.75 and S = 0.65, D = 0.55 and S = 0.40, and D = 0.45 and S = 0.30. These empirical relations provide a basis for classifying patterns of prototypical segregation on a continuum from low to high as suggested in the chart in Table 2.3.

Table 2.3 Guidelines for categorizing values of dissimilarity and separation (×100) from low to very high under conditions of prototypical segregation (D-S agreement)

The empirical relationship of D and S in situations of dispersed unevenness and situations that are intermediate between polarized unevenness and dispersed unevenness cannot be summarized as easily, but we offer the following guidelines below.

  • When S ≥ D3/2, the pattern of segregation can be characterized as prototypical segregation wherein displacement from even distribution produces polarized non-parity areas and a near maximum level of group separation.

  • When S < D3/2, the pattern of segregation can be characterized as near-prototypical segregation wherein displacement from even distribution produces both polarized and non-polarized non-parity areas and thus a high, but well-below maximum, level of group separation.

  • When S < (D − 0.10)3/2, the pattern of segregation can be characterized as dispersed displacement from even distribution producing many non-parity areas that are relatively close to parity and only a moderate level of group separation.

  • When S < (D − 0.20)3/2, the pattern of segregation can be characterized as highly dispersed displacement from even distribution producing non-parity areas that are close to parity and a low level of group separation.

Next, the purpose of Table 2.4 is to provide a concrete frame of reference for some of the important findings and conclusions we will offer in the analysis chapters to come. For example, we will conclude that White-Black segregation is exceptional in comparison to segregation involving other groups because uneven distribution is much more likely to take the form of polarized unevenness and creates the logical prerequisite conditions for group inequality on stratification outcomes linked to spatial location of residence. In contrast, we will conclude that White-Asian segregation almost never takes the form of prototypical segregation but instead usually involves a pattern of dispersed or highly dispersed unevenness across non-parity areas that are close to parity and thus creates little to no group separation and minimal possibilities for group inequality on location-based stratification outcomes. Similarly, we will conclude that Latino households in new destination areas initially experience segregation from White households in the form of dispersed displacement into non-parity areas that are near parity, not homogeneous Latino areas, and thus the segregation pattern does not produce high levels of separation from White households. But as Latino households in new destinations become established and their presence grows, the pattern of segregation changes and begins to take on the form of near-prototypical segregation with higher levels of residential separation of White and Latino households. These and other important conclusions are not offered in previous research. The table provides a clear guide to our basis for characterizing these patterns and trends in segregation. They are not subjective assessments; they are conclusions guided in an explicit and clear framework for characterizing segregation patterns.

Table 2.4 Rules of thumb for characterizing D-S combinations (×100) as reflecting uneven distribution involving patterns ranging from dispersed displacement to prototypical segregation

2.8 Summary and Overview

We conclude this chapter by acknowledging that study design and methods are not the most scintillating of topics. But these aspects of a study are crucially important to the potential for our study to advance understanding of segregation in U.S. communities beyond its current state. Our study’s contribution is based in substantial degree on incorporating these recent advances in methods for measuring and analyzing segregation. This is most important in the following areas: distinguishing whether group differences in distribution into below-parity areas produces uneven distribution involving group separation or merely dispersed unevenness; whether values of segregation index scores are unbiased and trustworthy or biased and untrustworthy to some greater or lesser degree; and whether assessments of segregation using unbiased indices take proper account of the fact that locational outcomes for persons residing in households are linked and not independent. These and other issues may not make for exciting reading, but they are important to our ability to document patterns and trends in segregation across U.S. communities more thoughtfully and accurately than has previously been possible. Now, with these crucial issues covered, we turn next to applying these methods to generate empirical findings.