Keywords

1 Introduction

How do we make sense of income inequality and residential segregation in cities as different as Houston, Hong Kong, and Johannesburg? Finding common ground between cities in disparate national context has the potential to illuminate overlooked factors that influence segregation and suggest new directions for study. For example, Melbourne in Australia and Boston in the United States have much in common: near identical population size, large immigrant populations in primarily white regions, sprawling suburbs, and similar levels of income inequality before redistribution. Yet, our data show that Melbourne is less segregated than not only Boston, but also nearly all American cities. Is Melbourne less segregated only because it is in Australia or are there characteristics unique to Melbourne and other Australian cities that set them apart? This chapter introduces the most comprehensive international database on segregation by income to date as a tool that can help elucidate such questions.

We use a sample of 194 cities in 14 countries to show the extent of variation in residential segregation by income (income segregation from hereafter) within and between countries. We focus on this difference because it provides crucial insights into the process of comparison. International comparisons compound the number of relevant explanatory factors: the role of government in the housing sector, history of discrimination, and economic structure are all likely to have significant influence on the degree of segregation in cities. Case studies, like the ones in the following chapters of this book, are ideal for analyzing how these factors intersect to shape the socio-spatial structures of a city. However, case studies tend to focus on primate cities and can never zoom out to measure systematic variation across borders. To understand the role and magnitude of these factors, and identify useful comparative cases, requires consistent data across countries. We take the first steps towards this kind of analysis.

The chapter consists of two main parts. First, we provide an overview of the theoretical and empirical literature on comparative segregation studies. The review highlights trends in international research and the potential (and limitations) of this kind of work. It also provides a foundation and scope for interpreting our empirical results. The second part is a descriptive analysis of income segregation data. We have been working on expanding the international coverage of comparable data to a diverse set of countries so that the work of adding layers of analysis and understanding can build upon it (Comandon et al. 2018). Figure 2.1 shows the location of cities and countries included in the sample. In nine of these countries, we have spatially small-scale data on income (or some close equivalent) for all large urban areas. In the other five, there is either only one large urban area in the country or we had access to data for a single city. We are still in the early stages of developing the international database, which limits the scope of the analysis to income. However, the results show the potential of these data and of expanding the database.

Fig. 2.1
A world map. The majority of the study's cities are found in North America, Brazil, Europe, South Africa, Australia, and East Asia.

Map showing the location of countries and cities included in the study

For each city, we calculate the dissimilarity index to summarize the metropolitan level of segregation in an intuitive and easily comparable measure. We measure residential segregation between the bottom and top income quintiles and between the bottom and middle-income quintiles. We include both measures to emphasize the dynamics of income inequality. Existing research shows that segregation of the highest income residents tends to drive overall segregation, leading to the implication that greater inequality will translate to greater segregation (Reardon and Bischoff 2011).

Consistent with this trend, we find that segregation between the middle and bottom of the income distribution is lower for all cities within a country and, on average, across countries. The national average segregation between the top and bottom is significantly higher in all countries except Mexico where it is near identical. We also find that greater income inequality does not necessarily translate to greater segregation, though cities of extreme disparities do fall into this pattern. Cities near the average level of inequality, span the entire spectrum of segregation levels. We conclude with a set of recommendation for future comparative research on residential segregation.

2 Challenges of Comparing Segregation Across Borders

Two types of challenges undermine the systematic international comparison of residential segregation in urban areas. The first challenge relates to interpreting the data that we have access to. Even though levels of segregation in two cities are similar, can we truly compare how a working-class household in, for example, Houston and Hong Kong, experiences spatial inequality in their city? The second challenge is purely empirical. The data required for comparison is collected and made available in different formats, with different coverage, and often there are no data available at all.

These challenges make the comparison of cities contentious and difficult, but it should not be abandoned. In this section, we review research relevant to the first challenge to frame our empirical approach to international comparison. It highlights the role of research this edited volume exemplifies as a path forward combining the complexity that case studies allow with larger scale data analysis. It also becomes clear that large-scale data analyses lag in coverage and scope, issues relevant to the second challenge. The rest of the chapter will focus on the state-of-the-art concerning this challenge.

There is an astounding number of factors that make cities more distinct than, for example, countries. Countries, with few exceptions, fit within a system of nation states, have defined, stable borders and central governments. Cities, on the other hand, often have no clear-cut borders. They include municipalities, which have boundaries, and urbanized areas outside those boundaries. Municipal boundaries not only fluctuate, they also matter little for many urban infrastructure and processes (e.g., work commutes).Footnote 1 Furthermore, cities are embedded within distinct polities (sometimes at several governmental levels, as is the case in federal systems) that have authority over them, multiply the number of historical paths to urbanization, and tend to change more rapidly than other units of analysis.

This distinction of urban areas has spurred a flourishing theoretical debate about the nature of cities and their comparison. Key questions at the core of this debates include how we understand the relationship and ties between and within cities (Jessop et al. 2008), how to balance individual, experience and generalizable analysis (Robinson 2011; Storper and Scott 2016), and how do we choose and develop the methods for comparison (Abu-Lughod 2007; Dear 2005; Gough 2012; Robinson 2016). These strands have all grappled with the challenges of using the city as a unit of analysis. Answers range from the poetic nomadism of Simone (2010) who suggest bringing pieces of cities together to form a new, cohesive unit, to the data-driven use of machine learning to map every urban settlement down to the last house (Esch et al. 2017).

These debates have seeped into the study of segregation. Greater emphasis on the significance of spatial scale has given rise to re-assessment of the mechanisms of segregation (Fowler 2016; Schafran 2018; Trounstine 2018) and methodological innovation (Lloyd et al. 2014; Reardon et al. 2006; Petrović et al. 2018). The growing diversity of cities has displaced dominant binary narratives to be replaced with multifaceted analysis and greater scrutiny of the role of residential integration (e.g., Clark et al. 2015; de la Roca et al. 2014; Goetz 2018; Musterd 2003). The persistence of segregation and combination of forms of inequality has widened the lens to include multiple domains (van Ham and Tammaru 2016), including schools (e.g., Bischoff and Tach 2018), housing (e.g. Owens 2019), and infrastructure (e.g., Trounstine 2018). Here, too, answers tend towards the multiplication of methods rather than a coherent framework to study spatial inequality.

This expansion of the study of segregation does not translate easily to an international context. Ethnicity and race, for example, are critical dimensions of segregation that cross borders. They have, however, different meanings and influences depending on a country’s history of racial oppression (Abu-Lughod 1980; Massey and Denton 1993; Telles 2006) and its colonial history (Nightingale 2012). As such, the interaction of race and class will have different undertones in Canada, the United States, and South Africa (e.g., Fong 1996; Johnston et al. 2007). In the multi-racial context that defines many large metropolises today, interactions between groups, their status within a nation (e.g., recent migrants), and the prevalent socioeconomic stratification can further complicate the picture. Quillian (2012), for example, showed that the interactions between three types of segregation—ethno-racial segregation, poverty segregation within ethno-racial groups, and segregation of higher income groups—contributed to the process of spatial concentration of poverty. Reproducing studies of this complexity and scope in multiple countries not only requires much data, it also requires an intimate understanding of how these factors interact in the local context. This edited volume takes a significant step in that direction by balancing local knowledge, geographical scope, and complexity.

Trounstine (2018) highlighted another dimension that needs systematic engagement. While researchers often summarize segregation as a single index, segregation operates within jurisdictionally defined units that have greater relevance for residents’ well-being. She showed that levels of neighborhood racial segregation are going down in many regions of the United States, but is being reinforced at the municipal level with far reaching implications for access to critical services (see also Bischoff 2008; Fennell 2009). The chapters in Lloyd et al’s (2014) edited volume make a similar point, though they emphasize how single-index summaries obscure much of the variation that gives segregation meaning. As Hwang’s (2014) chapter demonstrates, and in a reversal of our initial question, two cities can be very similar in many respects, and yet have entirely different outcomes in terms of segregation.

Recent innovations in the field of segregation studies have advanced our understanding of spatial inequality in a small set of cities and countries. However, there is a long way to go for large scale comparative work to catch up to these refinements. Existing comparative studies tend to be regionally defined (e,g, Musterd et al. 2017; Tammaru et al. 2020 for Europe) or a wide-ranging selection of individual case studies that emphasize the distinct features of each (Maloutas and Fujita 2012). Some comparative approaches have focused on specific aspects, such as race (Fong 1996) or the role of different types of welfare states (Arbaci 2007). What is missing, including from this review, is the systematic integration of knowledge that does not derive from the hegemonic Anglo-Saxon framework of understanding. As access to data expands to include countries from outside the Global North, more needs to be done to interrogate the assumptions that decades of dominance by American scholarship embedded in the methods and in the analytical lenses that we use.

3 Method and Data

What we generally understand as cities are more accurately described as urban regions. Regions are the sum of urban areas that make up a relatively unified labor and housing market (Storper et al. 2015). They are the appropriate scale of study for segregation because urban regions often represent regional housing markets. For example, when someone gets a new job in the Sydney central business district, they are not constrained to living in the city proper. They may elect or, in fact, only be able to afford to live in a distant suburb. Residential segregation is the sum of this process of sorting across administrative boundaries and should, therefore, be studied at the scale that matches the process.

The first step in comparing cities, then, is to establish their boundaries. However, even this step proves challenging. The norm is to use commuting patterns to estimate the extent of the regional market (OECD 2012). Basically, a functional urban area is the sum of all urban clusters where a substantial share (15%) of residents commute to the largest cities in the region. The lack of such data in many countries has led researchers to look for alternatives to achieve the consistency that is essential to robust results (Bosker et al. 2018).

For this study, we use the OECD harmonized database of Functional Urban Areas (FUA). The OECD database covers Australia, Canada, Denmark, France, Ireland, Mexico, the Netherlands, the United Kingdom, and the United States. For South Africa, New Zealand, and Brazil, which are not in the database, we use an alternative definition (based on administrative definition) closest to the scale of the region. We limit the sample to FUA and regions with a population over 500,000 people to ensure each city in the sample has sufficient data coverage in every country.Footnote 2 This gives us a sample of 194 urban regions in a total of 14 countries. In five countries, however, the data include only one city either due to data availability (Japan) or because the country has only one large FUA (Denmark, Hong Kong, Ireland, New Zealand).

For each of these countries, a further obstacle is the differences in data type, spatial scale, and data collection methodology. Some differences are easily, although not perfectly, remedied. For example, France collects income data as decile threshold values. Each tract is assigned Euro denominated values that correspond to each 10% of the population of the tract. For example, if the 10% of the population with the lowest incomes have income below €8500, that is the value reported in the data. The problem is that the values are not comparable across tract because the income is not relative to a fixed point. In contrast, all other countries define a set of income categories based on fixed ranges and report the number of households that fall within that range. In Canada, for example, the first of 15 income categories ranges from $0 to $5000. We address these differences through a mathematical transformation that uses the information about decile values to estimate how many households fall within income categories we defined.

More troublesome are the differences in the spatial scale of small spatial areas, and their coverage. Ideally, we would have data reported at a consistent scale, with full geographic coverage of the region, and based on the full census of the population. Much of our work has been devoted to identifying the differences in data and correcting them where possible. Throughout, we refer to the baseline geographic unit as the tract. This is the neighborhood-scale unit the United States Census Bureau uses and has an equivalent in most countries in our sample. We summarize the data format in Table 2.1.

Table 2.1 Summary of

Some countries have rules about the minimum number of households that must be in a tract before the data can be released (due to privacy concerns). This often makes the coverage sparser outside the urban core of a region. In these cases (France, Canada, and the Netherlands), we complement the tract data with the next smallest administrative unit, which ends up being about the same size in terms of population, though not geographically. Differences between countries are more difficult to bypass. In some cases, a full range of spatial scales are available, and we can pick the one most consistent with the average size of the tract in other countries. However, we are sometimes stuck with a spatial unit that is either larger or smaller than the tract. For example, the two smallest administrative units in Australia, SA1 and SA2, straddle the tract size. SA1 is smaller, the equivalents of a few square blocks. SA2 works in some dense areas but is too large for the lower-density suburbs.

Differences in the spatial scale used to calculate segregation indexes will have an impact on the calculated values. The difference in unit of analysis areas in our sample is not so large that it would lead to the reinterpretation of the broad patterns that we describe (Wong 2004; Manley et al. 2019). The countries for which the scale of the geographic unit is of greatest concern are Brazil and South Africa. The two countries have the highest levels of segregation and the small scale of their units may bias the estimates upward. However, the results are consistent not only with other methods that minimize the effect of scale (Comandon et al. 2018), the two countries also have some of the highest levels of income inequality and, in the case of South Africa, a history of violent segregation that substantiates the high observed levels.

Differences in the timing of the census add another concern. In cross-sectional studies like this one, time is an issue only to the extent that levels of segregation have likely changed in the intervening years (i.e., we do not integrate comparison between years), thus undermining the comparison. Mexico, Japan, and the Netherlands are the only country with data pre-dating 2010. Mexico, in particular, is problematic because the country and cities have changed more rapidly than most other countries in our sample. Results from countries where longitudinal data are available (e.g., United States) confirm that even in countries with more stable urbanization rates, large changes occur (Comandon et al. 2018). Cases like Mexico illustrate the limitation of relying on census data which tends to evolve substantially over 10-year periods. However, changes, on average, tend to not change overall interpretation, especially for national trends (Monkkonen et al. 2018).

As part of the data summary, we include the income definition and the method of data collection. Differences in income definition are relatively benign. The main differences relate to whether total income is reported or after-tax income, and to the composition of income. Generous income redistribution programs and higher tax rates will alter the income distribution from pre-tax to after-tax. However, redistributive programs should not shift the income distribution so much that the relative position of households changes drastically (i.e., a household in the bottom quintile of the pre-tax distribution is unlikely to end up in the middle quintile of post-tax distribution). We therefore assume this difference is negligible for our purposes. As a verification, we use the Canadian data to calculate segregation indexes for both total and after-tax income and find a near-perfect correlation.

Of greater consequence are the differences in cost of living of each region. The comparison of households in the bottom income category in Buffalo, New York with those of San Francisco, California is distorted by large differences in the cost of living between these two cities. The median home, for example, costs 6 times as much in San Francisco than in the Buffalo region. While we cannot entirely account for these differences, we adjust the index in every city to calculate segregation between local income quintiles rather than set income categories that remain the same in all locations.

Income quintiles allow us to divide the population into meaningful categories. We use three quintiles for comparison. The bottom quintile includes households who are in the first 20% of the income distribution, meaning that 80% of the population has higher household incomes than they do. These are people that are severely constrained in terms of where they can live within the urban region and have incomes much below the regional median. The middle quintile are households between 40 and 60%. This category includes the median and represents the middle-class. Finally, the top quintile are those with income higher than 80% of the population and have the greatest choice in where to live.

We use a relatively crude method to pick the quintile thresholds. We estimate the income distribution for the entire region and pick the income categories closest to the quintile threshold.Footnote 3 For example, the first/bottom quintile of the income distribution of New York is $23,200. It falls within the income category bounded by $20,000 and $25,000; therefore, we use all households with income below $25,000 in every tract as the bottom quintile; if the quintile value were to fall closer to the lower bound, we would use all households in and below the $20,000 category. The results are an approximation of income quintile, but one that, on average, is close enough to separate the population into relevant categories.

The last cause of concern is the method of data collection. Census offices tend to collect a comprehensive set of data on the full population and then collect a more extensive survey on a subset of the population. Income data usually comes from the more extensive survey (exceptions include Australia, which has comprehensive coverage). Overall, the countries we include tend to have higher sampling rates, making the United States the case of greatest concern. The use of samples to collect this information means that all reported numbers are estimates that come with margins of errors. In small tracts and in places with small total population, these margins of errors can be large enough to completely undermine the reliability of segregation indexes like the Dissimilarity Index (DI) (Napierala and Denton 2017). Issues with the sampling strategy and sample size can be corrected to some extent, using simulation techniques, for example, but the type of data we use reduces such concerns.

The concerns about margins of error tend to stem from the underrepresentation of a group of interest. However, since we are interested in income classes, we can expect a relatively even distribution throughout the region. Furthermore, our strategy aggregates income groups to obtain quintiles, which further reduces the incidence of tracts with little to no representation. As an additional attempt to put to rest some of these concerns, we use a method Reardon et al (2018) developed to correct for the bias in ordinal segregation indexes that comes from the sampling method. This is not the same index, but the magnitude of the correction should parallel what we would find for the DI. The correction for income data in the United States is up to 10% of the estimated value, but much smaller in Canada, which has a sampling rate of 25% (in contrast to around 8% in the United States, see Table 2.1 for an overview of sampling rates).

Like the rest of the book, we used the Dissimilarity Index (DI) as our measure of segregation. We used the dissim function in the “seg” package in R to calculate the index for every city in the sample (Hong et al. 2019). We ran the operation for the bottom and middle quintile, and the bottom and top quintiles. The dissimilarity index has many shortcomings (e.g., Napierala and Denton 2017; Reardon et al. 2006), but remains useful as an intuitive indicator of a city’s spatial structure. In interpreting the index, however, it is important to keep in mind that there is no such thing as no segregation, nor is the absence of segregation desirable (Ellickson 2006). There is a level of segregation that would always be present purely by virtue of the distribution of the housing stock and the impossibility of restricting people’s residential choice (Sander and Kucheva 2016). Massey and Denton (1993) have therefore proposed a generally agreed upon rule of thumb for what constitutes low (0.2–0.3), medium (0.3–0.5), and high levels (>0.5) of segregation. In our interpretation, however, we rely more on relative levels than on the values themselves.

4 Results

The sets of DI values show that residential segregation between the top and bottom income groups is much higher than between the bottom and the middle-income group (henceforth, we compare other income groups with the bottom category as the reference point, i.e., we refer only to top DI and middle DI to indicate how segregated they are from the bottom group). The average middle DI is 0.26 compared to 0.48 for the top. Figure 2.2 shows these differences in magnitude across and within countries. Variation in top DI (excluding single city countries) across countries is also larger than it is within any country. The difference between the highest and lowest national median top DI is 0.42 which is much more than the countries with the largest range of about 0.25. In contrast, variation between country medians for middle DI is 0.09 and the widest national range in Brazil at about 0.25, six countries with ranges above 0.09.

Fig. 2.2
2 box plots. The left plot for the bottom and top quintiles has the highest value for South Africa, which ranges approximately from 0.6 to 0.7. The right plot for the bottom and middle quintiles has the highest value for Brazil, which ranges approximately from 0.25 to 0.35.

Box plots showing the variation in segregation levels between bottom and top income quintile (left panel) and bottom and middle-income quintile (right panel). The light gray points indicate the median DI for the other comparison group (i.e., in the left panel, they show the median value of the country in the right panel)

Individual cities highlight these differences. The lowest top DI is Tokyo at 0.2 compared to the high of 0.73 in Tshwane. The same comparison for middle DI between Copenhagen (0.1) and Santos (0.4) shows that overall smaller range of variation. It should also be noted that the principal cities of many countries are absent from the extremes. Large economic centers tend to concentrate extremes of wealth and poverty (e.g., Paris) but, in most cases, the largest economic centers (e.g., Toronto, New York, Johannesburg, and London) are closer to the national median.

In comparing top and middle DI, we note that the relationship is unstable. Mexico is the only country where the median top DI falls within the range of middle DI. In other words, it is the only country without a significant shift between middle and top DI. South Africa, by contrast, has the highest median top DI and one of the lowest middle DI. The extremely skewed income distribution in South Africa paired with a history of institutionalized racial segregation has created cities of large wealthy enclaves surrounded by areas of relatively mixed middle and lower incomes (Murray 2011).

The lack of correlation between national top and middle segregation levels is replaced with greater stability in relative position of individual cities. The correlation between cities’ top DI and middle DI segregation is 0.63. Figure 2.3 illustrates the relationship between middle and top DI. The figure plots the rank of cities according to their top and middle DI, giving an overview of the stability of their relative position. The 45-degree line represents no difference in rank. Many cities are close to this line, indicating that cities with lower middle DI tend to also have lower top DI. In addition, most (three quarter) cities remain either below or above the median national level, as shown by the white symbols.

Fig. 2.3
A line graph for the top D I rank versus the middle D I rank plots a linear diagonal line. The cities that remained above and below the national median line are prominent in the top right and bottom left, respectively.

Plot of top and middle DI rank for every city. The cities are ranked from lowest DI (i.e., rank 1) value to highest (rank 196). Points on or near the 45-degree line are cities that have the same rank for both types of segregation. Cities above the line have a higher top DI rank than middle DI rank, cities below the line are the reverse. Each symbol represents a type of relationship between individual city DI and national median, either staying higher/lower than the national median or moving up or down relative to the other cities in the country

There is, however, substantial movement in a subset of cities. In Brazil, Florianópolis is one of the least segregated cities between the lower and middle quintiles, but one of the most segregated cities when comparing the top and bottom groups (from rank 182 to 37). Norte/Nordeste Catarinense displays the opposite relationship, its rank shifts down from third highest middle DI to 54 for top DI. Similar trends are present in the United States and in the entire sample, which is split nearly in half between cities moving up and down the ranks.

As noted in the case of South Africa, the level of inequality has the potential to significantly affect segregation. Figure 2.4 shows the estimated city-level GINI as well as the national level (the two are strongly correlated).Footnote 4 The bivariate regression line shows a positive relationship between the two. However, the cluster of high-inequality cities (Brazil and South Africa) seem to drive the overall trend. For cities and countries with GINI coefficients below 5, the relationship is more ambiguous. Hong Kong and Tijuana have similar levels of income inequality but the top DI in Hong Kong is nearly three times as high as that of Tijuana. There is also significant national clustering. The dots above Tijuana are Mexican cities that display a similar relationship of high-inequality and low segregation. This may be a result of the unique dynamics of movement between central city and periphery in the Mexican context (Monkkonen et al. 2018).

Fig. 2.4
A graph for D I of bottom and top quintiles versus G I N I plots a linear, right-slanted line. The highest values around 0.6 are for the cities with G I N I coefficients of 55.

Plot of top DI and GINI coefficient for all cities with income data (excludes UK, Ireland, and the Netherlands). The symbols are the national GINI coefficient rounded to the closest multiple of 5

Finally, the data also shed light on how we understand cities in comparative perspective. This data will gain greater meaning once paired with more detailed contextual analysis. The high levels of segregation in Brazil may not come as a surprise, but the method points to factors other than income inequality. How we defined cities matters. In the Brazilian context, and in South Africa to a lesser extent, the region encompasses and concentrate the extremes of the country. Some of the regions, such as Manaus, Brazil, include great hinterland areas that have often been marginalized in the process of rapid urbanization (Kanai 2014). This combines with the landscape of urban inequality in the urban core to create a layering with no direct parallel in the well-established cities of Europe and North America.

5 Discussion and Conclusion

Now that we examined the data, we can come back to the questions we opened with. Despite the lack of contextual data about individual cities and countries, there is much we can infer from the observed patterns. We learn that income inequality, certainly high-inequality, correlates with residential segregation between income groups but leaves much variation unexplained. Many cities with similar levels of income inequality, even within the same country, have segregation levels at the opposite ends of the spectrum. While differences in segregation are not as large within countries as they are between countries, our data suggest that both need to be studied in conjunction. Therefore, it is not just a matter of understanding the difference between Melbourne and Boston as cities within countries with different political economy, it also matters that segregation levels in all Australian cities are similar while in the United States they vary from Australian to Brazilian levels. This relationship between inequality and segregation raises important substantive and methodological questions for between-country and within-country comparisons.

The case of Mexico, for example, stands out on its own because of the prevalent low levels of segregation by income but raises additional questions in comparison to other middle-income, high-inequality countries such as South Africa and Brazil. South Africa shares more with Mexico when limiting the comparison to middle-income segregation when both countries have lower levels of segregation. It may be that Mexico’s data limitation fails to capture the translation of high inequality into spatial patterns of separation or that Mexican cities developed in such a way that the isolation of wealthy households that defines South African cities has not taken hold to the same extent.

Elucidating these questions has implications for policies aiming to reduce segregation. There is too little evidence to speculate as to the importance of national welfare systems on explaining the differences in within-country variation. Some countries, like France, have a more centralized system of urban governance and comprehensive redistributive systems, yet, the range of income segregation levels resembles more closely that of the Unites States than the Netherlands. This can be interpreted as evidence that differences between cities are more important than national-level differences. The lower average level of segregation in higher income equality countries may point in the other direction. Here too, however, important deviations prevent straightforward inference.

A key aspect of explaining deviations from general relationships is history. Single country studies have revealed important processes that persistently shaped segregation. In the United States, for example, scholars have showed how the migration of black people out of the South led to the creation of modern residential racial segregation. Where the social hierarchy was institutionalized in the South under slavery and Jim Crow laws, the rest of the country lacked such rules to establish white dominance and relied instead on the systematic spatial separation of black migrants (Logan 2017). To this day, southern cities have lower levels of segregation than the rest of the country. Nightingale (2012) showed that a similar process operated in South Africa where residential segregation was unnecessary until a large black labor force developed in urban centers. The data we presented can serve to conceptualize historical processes more broadly to include the implications of different starting point of urbanization, colonial relations, and changing economic relations (e.g., centered on labor and property).

One promising area of study is the integration of economic geography into the study of segregation. As cities take on different roles in relation to their international peers and national competitors, the pressures on urban structures will be different. A large-scale data analysis would allow for the modeling of urban system to consider sub-national labor market trends and the accompanying migration patterns. It would also open the possibility of studying the role of different historical trajectories of cities. The period in which urbanization takes place, and the set of events that shapes the life of a city matter, but studying such phenomena as cases studies can lead to self-fulfilling analysis if one chooses cases based on the outcome one wishes to study (Abu‐Lughod 2007). The trends in urban research and segregation studies point to fruitful complementarities between case studies and large-scale data analyses.

Many questions we have suggested point to the importance of data spanning several time periods for future research. The only country for which we have access to reliable data over time for the entire country is the United States. Even in that case, issues of comparison over time are non-negligible (Reardon et al. 2018). Countries are improving their data collection methods with every census and we hope that as time passes, the scope of comparison will only increase.