Comparing ethnic segregation across cities—measurement issues matter

This paper investigates how important measurement issues such as the modifiable areal unit problem (MAUP), random unevenness and spatial autocorrelation affect cross-sectional studies of ethnic segregation. We use geocoded data for German cities to investigate the impact of these measurement problems on the average level of segregation and on the ranking of cities. The findings on the average level of residential segregation turn out to be rather robust. The ranking of cities is, however, sensitive to the assumptions regarding reallocation of population across neighbourhoods that the use of different segregation measures involves. Moreover, the results suggest that standard aspatial approaches tend to underrate the degree of segregation because they ignore the spatial clustering of ethnic groups. In contrast, non-consideration of random unevenness gives rise to a moderate upward bias of the mean segregation level and involves minor changes in the ranking of cities if the minority group is large. However, the importance of random segregation significantly increases as the size of the minority group declines. If the size of specific ethnic groups differs across regions, this may also affect the ranking of regions. Thus, the necessity to properly account for measurement issues increases as segregation analyses become more detailed and consider specific (small) minority groups.


Introduction
Rising international labour migration and high numbers of asylum seekers have recently released an increasingly fierce dispute about ethnic segregation and integration in many destination countries. Different theories predict that an influx of new migrants might give rise to significant changes in the level of segregation (Cutler et al. 1999). Moreover, ethnic segregation is supposed to influence economic outcomes of cities and their inhabitants (Cutler and Glaeser 1997). The spatial distribution of immigrants is thus an important characteristic of urban areas and the ongoing debate about segregation and potential adverse effects resulting from a spatial separation of natives and immigrants calls for credible findings on the extent of ethnic segregation in cities. Reardon and O'Sullivan (2004) stress that a reliable and meaningful measurement of segregation is essential for analyses of its causes and consequences.
There is a voluminous literature on the causes of (ethnic) segregation (e.g. Cutler et al. 1999;Boschman and van Ham 2015;Skifter et al. 2016), on (economic) effects of segregation (Cutler and Glaeser 1997;Damm 2009) and measurement issues (Massey and Denton 1989;Arribas-Bel et al. 2016;Nijkamp and Poot 2015;Musterd and Van Kempen 2009). However, only a few studies provide comprehensive and consistent results on ethnic segregation for a group of cities or regions. Andersson et al. (2018b) note that comparability of the existing evidence tends to be limited due to differences in data acquisition, spatial units and definitions of ethnic categories. In particular, previous studies frequently neglect important measurement issues. The methodological literature primarily addresses the modifiable areal unit problem (MAUP), random unevenness and spatial autocorrelation. This study examines how these measurement problems affect the findings of cross-sectional analyses of ethnic segregation.
The MAUP arises because analyses are usually based on administrative spatial units, such as zip code areas or census tracts. Varying the boundaries for a given spatial system or changing the level of spatial aggregation might influence the values of segregation measures (Krupka 2007). Some authors therefore propose to use geocoded data in segregation studies (Arribas-Bel et al. 2016;Reardon and O'Sullivan 2004). However, robust comparative evidence for cross sections of regions is still limited although geocoded information is increasingly available for European countries (Andersson et al. 2018a). We investigate ethnic segregation in German cities based on geocoded data, which allows us to account for the MAUP. Moreover, we compare our findings with the results of previous studies that make use of administrative spatial units in order to evaluate the severity of the MAUP for cross sectional analyses. Massey and Denton (1988) argue that ethnic segregation is a phenomenon with different dimensions and distinguish evenness, exposure, concentration, centralization, and clustering. Johnston et al. (2011) note that most analyses focus on evenness and exposure. Some studies compare the results of different segregation measures and conclude that they seem to produce similar findings (e.g. Bailey 2012;Massey and Denton 1989). However, standard measures of evenness and exposure are aspatial, i.e. they ignore spatial relationships among residential locations (Reardon and O'Sullivan 2004). In this analysis, we consider the most important dimensions of segregation, evenness and exposure, and investigate whether the spatial clustering of ethnic groups affects the corresponding results.
Finally, Carrington and Troske (1997) note that standard measures of residential segregation might indicate a substantial degree of segregation even if individuals are randomly allocated across spatial units. They propose modified indices that measure the deviation from a random allocation instead of distance from evenness. Analysing residential segregation in Auckland, Maré et al. (2012) take into account both random segregation and spatial autocorrelation. However, the authors do not consider how corresponding measurement issues affect a comparative analysis of residential segregation in a cross-sectional setting. This also applies to a study by Glitz (2014) who examines ethnic based sorting across municipalities in Germany.
To summarize, while there is an extensive literature on different measurement problems, evidence on the severity of these issues in applied research is scarce. Often studies focus on one specific measurement issue and do not provide comparative evidence on how the most important problems affect cross-sectional segregation analyses. Previous studies rather examine differences in the degree of segregation across distinct ethnic groups (e.g. Johnston et al. 2011;Massey and Denton 1989) than disparities between regions. Our analysis examines how the MAUP, spatial autocorrelation and random segregation influence cross-sectional evidence on ethnic segregation and, thereby, adds to a limited literature on regional disparities in residential segregation.
Focusing on the most common indicators, we analyse whether the average level of segregation and the ranking of cities varies across different measures and dimensions of segregation. In order to investigate whether the above-mentioned measurement issues matter in applied research, we also contrast our results with findings of earlier studies summarized in Helbig and Jähnen (2018). The use of a consistent database and detailed geocoded information allows us to compare the level of ethnic segregation across cities and to assess the effects of the MAUP, spatial autocorrelation and random segregation. The findings should thus provide some guidance regarding methodological requirements of segregation analyses. Moreover, we evaluate the accuracy of previous research, which ignored these problems partly due to limited data availability.
The rest of the paper is organised as follows. The next section presents a brief survey of the literature with a focus on studies on ethnic segregation dealing with measurement issues. Sect. 3 describes the data set and Sect. 4 the applied indicators of ethnic segregation. We discuss the results of our analyses in Sect. 5 and Sect. 6 concludes.

Literature
A detailed review of the extensive empirical literature on ethnic segregation is beyond the scope of this paper. We focus on studies dealing with measurement issues. There is an intense debate about different dimension of segregation and appropriate methods of measuring these different dimensions (e.g. Massey and Denton 1988;Reardon and O'Sullivan 2004). Massey and Denton (1988) distinguish five different dimensions of segregation, which are thought to complement each other in empirical analyses. Evenness or dissimilarity refers to differences in the distribution of ethnic groups across areas, while exposure indicates the level of potential contact, i.e. the possibility of interaction between distinct groups within given spatial units. Clustering means the extent to which neighbouring areas show a similar demographic composition. The authors also discuss concentration, referring to the area occupied by a minority group, and centralization, as indicated by the distance of the residence to the city centre.
According to Bailey (2012) there is a view that various segregation measures are closely related and tend to produce similar findings. Massey and Denton (1989) examine the correlation of measures for a cross section of U.S. metropolitan areas and conclude that although there is an empirical overlap of the five dimensions of segregation, corresponding indices do not perfectly replicate another. Johnston et al. (2011) note, however, that most studies focus on two dimensions, evenness and exposure. They re-examine the five dimensions using more recent data and identify only two independent dimensions, which they term separateness and location. Altogether, only a small number of studies provides comparative evidence on the different dimensions of segregation for cross sections of regions. Studies that consider different dimensions and measures of segregation often focus on a specific region and rather investigate differences in segregation patterns across ethnic groups (e.g. Maré et al. 2012;Brown and Chung 2006).
There is a parallel discussion about the appropriateness and drawbacks of different segregation measures. An important issue is the aspatial nature of the most commonly used measures. Reardon and O'Sullivan (2004) note that the indices suffer from two main problems: the checkerboard problem and the MAUP. The latter results from the fact that frequently the units of observation are administrative spatial units such as zip code areas or census tracts (Arribas-Bel et al. 2016;Östh et al. 2014). Van Mourik et al. (1989) note that the widely used dissimilarity index introduced by Duncan and Duncan (1955) varies with the degree of disaggregation of the data and increases as units of observation become smaller. 1 Evidence for U.S. cities provided by Krupka (2007) suggests that the MAUP results in biased estimates of the relationship between the level of segregation and city size. The positive correlation appears to be at least partly spurious. Krupka (2007) argues that the relationship between city size and segregation, which different studies identify for the U.S., is an artefact of the data collection process. Frequently census tracts are used, which tend to be smaller in large urban areas as compared to small cities in the U.S.
The second drawback of the most common measures is that they ignore the relative location of areas. This is often referred to as the checkerboard problem (Johnston et al. 2011). Aspatial measures compare the ethnic composition of neighbourhoods irrespective of their spatial proximity. However, the level of segregation differs depending on whether similar neighbourhoods cluster in space or not (Reardon and O'Sullivan 2004). Some authors therefore make use of measures of spatial autocorrelation to account for this issue. Johnston et al. (2011) apply global measures of spatial autocorrelation to investigate ethnic segregation in Auckland and detect more pronounced changes in the pattern of segregation over time as compared to an analysis based on aspatial measures. 2 Harris (2014) complements an analysis of ethnic segregation in England based on the dissimilarity index with a related measure that takes into account similarities between adjoining areas. The findings point to decreasing differences in the ethnic composition of those neighbouring areas, which initially showed very dissimilar patterns.
Carrington and Troske (1997) stress that standard segregation measures likely overstate the degree of segregation because a random allocation of individuals across units might be associated with a significant level of unevenness, in particular if units of observation are small. They propose modified measures that consider random unevenness and therefore indicate the extent of systematic segregation that goes beyond the pattern under randomness. Maré et al. (2012) apply this approach and investigate the extent of residential segregation for different ethnic groups in Auckland based on a broad range of indices that capture different aspects of the demographic composition of neighbourhoods. Their findings point to significant unevenness, isolation and spatial clustering of different ethnic groups. However, the results of different measures show a substantial variation. Glitz (2014) uses social security records to examine ethnic based sorting across municipalities in Germany and reports systematic dissimilarity and isolation indices, which control for random segregation. He concludes that residential segregation is pervasive and persistent. However, the analysis does not involve a comparison of the level of segregation across regions and neglects the spatial clustering of ethnic groups. This also applies to a study by Janßen and Schroedter (2007) who analyse segregation in West Germany based on information from the microcensus, a representative one percent sample survey of the population. They use the share of the foreign population in small census districts as an indicator of ethnic segregation. 3 2 They note that the checkerboard problem has also been addressed by calculating indices that combine dissimilarity and clustering (e.g., Reardon and O'Sullivan 2004). These measures are, however, difficult to interpret. 3 Studies on ethnic segregation in (specific) German cities tend to focus on evenness and make use of the dissimilarity index (e.g. Friedrichs 1998). The most comprehensive evidence for German cities is presented by Helbig and Jähnen (2018) who thoroughly survey the existing empirical literature and calculate segregation indices for 45 German cities based on information for administrative urban districts. However, the authors admit that the MAUP severely limits comparability across cities since the size of the districts significantly varies within and across cities.
Whether and to what extent the above-mentioned methodological issues matter in practice, i.e. when the level of ethnic segregation in a (group of) region(s) is concerned, is only occasionally considered. Among the rare exceptions is a study by Sleutjes et al. (2018) who compare findings for administrative units with results, which base on an egocentric measurement. Existing evidence suggests that the MAUP, random unevenness and spatial clustering likely affect the findings. Moreover, the dimension of segregation that is analysed might influence the ranking of regions. Using geocoded data for a cross-section of German cities, this paper tries to tackle this research gap via investigating different dimensions of ethnic segregation and considering different methodological problems. In particular, we analyse whether the estimated average level of segregation and the ranking of cities is seriously affected by the MAUP, spatial autocorrelation and random segregation.

Data
We investigate ethnic segregation in 83 German cities with at least 100,000 inhabitants using geocoded register data. The register data are collected in the administrative processes of the Federal Employment Agency and maintained in the Integrated Employment Biographies (IEB) of the Institute of Employment Research (IAB). The IEB cover all employed persons who are subject to social security contributions, recipients of unemployment benefit and social welfare, participants in active labour market policy and persons registered as job seekers. Thus, we examine ethnic segregation of the working population (sum of employees and unemployed persons). The employment data only covers workers who are subject to social security contributions. Civil servants and self-employed are not included in the IEB. Moreover, non-participation may potentially influence the results because, in particular, the participation rate of foreign women is lower than for native women. Specific ethnic groups, such as the Turks, mainly drive the difference (see Kogan 2011). Thus, there might be a tendency to underestimate the degree of segregation in regions marked by a high share of migrants. There is some indication of a negative correlation between individual labour market performance and ethnic segregation (Cutler and Glaeser 1997). However, our analysis indicates that the results, which base on working population data closely resemble findings reported for the entire population (see appendix).
In the IEB the citizenship of the persons is available. We therefore use the information on the nationality of the workers to define ethnic groups. Country of birth is the most widely used indicator in this context. However, this information is not available in most German statistics and small scale geocoded data on inhabitants that allows us to distinguish by migration background does not exist for Germany. Based on a survey of the corresponding literature, Helbig and Jähnen (2018) conclude that the majority of analyses of the German context have to rely on citizenship in order to operationalize the ethnic background. Moreover, findings by Klinger et al. (2017) and Dohnke et al. (2012) suggest that using citizenship instead of the migration background to define ethnic groups should not give rise to a major bias in studies of ethnic segregation. For instance, Klinger et al. (2017) show that the K Fig. 1 Size distribution of grid cells distribution of foreign inhabitants and Germans with a migration background across neighbourhoods is very similar. 4 A geocoded version of the IEB is available for the years 2007-2009. Each person in the IEB is assigned to quadratic grid cells of 500-meter length based on the residential address corresponding to each person's main spell at June 30 (see Scholz et al. 2012). We use these grid cells as our basic definition of neighbourhood and the cross section for 2009 for our analysis. 5 The number of grid cells with reported residences of workers ranges from 2673 in Berlin to 120 in Offenbach. Altogether, there are 38,148 grid cells and 10,418,323 workers in the data set. Fig. 1 shows that in the majority of grids only a few workers reside. The median population is 118.
Employing geocoded data allows us to compare the values of segregation measures across the cities because we avoid the MAUP that might arise from the use of administrative spatial units. In order to examine the importance of the scale effect of the MAUP we compare the results of segregation indices for different grid sizes. Grid cells of 500-, 1000-and 2000-meter length are used, which are nested in a hierarchical manner (for a similar approach see Wong 2003).

Measurement of segregation
Departing from the multidimensional concept by Massey and Denton (1988) our empirical analysis focuses on distinct dimensions of segregation, namely evenness (dissimilarity), exposure (isolation) and the spatial clustering of ethnic groups. We therefore apply different measures to investigate ethnic segregation. Making use of geocoded data we differentiate neighbourhoods or areas a in cities c.
We examine evenness using the well-known dissimilarity index. It measures the share of people of group g, which would have to relocate in order to arrive at a spatial distribution that is identical to that of a reference group. The index for group g in city c is given by: where P ga is the population of group g in area a and indicates aggregation over the corresponding index. The dissimilarity index takes its maximum value one if the group never co-locates with other groups in specific areas in a city. In contrast, the minimum value zero points to a spatial distribution that coincides with that of the rest of the population. However, a particular value of S D gc can correspond with a dispersed or highly clustered spatial pattern because the distribution of group g is compared with the spatial distribution of a reference group, which might be evenly distributed across space or cluster in specific areas. In Eq. (1), the entire population excluding the group under consideration is the reference. When we consider the spatial sorting of specific ethnic groups the reference is always the corresponding rest of the population that includes all other ethnic groups. 6 The dissimilarity index varies with the degree of disaggregation of the data and, thus, an analysis that makes use of information for administrative regions likely suffers from the MAUP (Wong 2003). This is a serious drawback for comparisons across cities if the units of observation are not harmonized. Another problem refers to the assumption on the redistribution of population across areas that the use of this measure involves. The results of the dissimilarity index might imply changes in the population of areas that are not feasible without changing the housing stock in the neighbourhoods (see Nijkamp and Poot 2015;Maré et al. 2012).
Moreover, a decreasing index may not necessarily indicate a decline in the degree of segregation if the change of the index is accompanied by a strong increase in the foreign population, which increases the percentage of the foreign population closer to a fraction of one-half. This is important for our analysis because the share of the foreign population differs significantly between East and West German cities. More precisely, the indicated need for redistribution across grids might be overstated for East German cities, as the percentage of the foreign population is relatively low in these regions.
Van Mourik et al. (1989) propose an adjusted dissimilarity index that takes into account this problem. The measure indicates the minimum proportion of group g and the rest of the population, i.e. foreign and native population, that would need to change the area of residence in order to equalize the spatial distributions under the condition that the total population in each area is constant (Nijkamp and Poot 2015) 7 : The second dimension of segregation that we investigate is exposure. Cutler et al. (1999) note that even if minority groups are disproportionately located in particular areas relative to the majority group, this distribution not necessarily implies that members of distinct groups have little or no contact. To measure the exposure of natives to immigrants, we apply the so-called isolation index that provides information on the extent to which the considered group dominates the population of a specific neighbourhood. Johnston et al. (2011) note that the measure is also sensitive to the group's proportion in the urban population. It should therefore be modified when making comparisons across cities with varying share of the foreign population. We apply a modified index of isolation proposed by Cutler et al. (1999): with ! ga D .P ga =P g /. The index bases on the average group share experienced by members of group g in city c. The reference distribution is a uniform distribution where in every grid cell the percentage of group g is equal to .P g =P /. A value of zero indicates that the group is distributed in proportion to the total population. A value of one can be interpreted as complete isolation whereby all of the group locate in one or several particular areas a, and no one of the rest of the population locates in those areas (Nijkamp and Poot 2015). The maximum value indicates that the group only lives in areas where they account for the entire population. The indices introduced above might indicate segregation even if foreigners and natives are randomly assigned to areas, as explained by Carrington and Troske (1997). A simple integer constraint might inhibit an even distribution across space. For example, evenness is unobtainable in a city with 50 foreigners and 100 areas. Second, random allocation of foreigners and natives to areas tends generate some variation resulting in a deviation from evenness. Following Carrington and Troske (1997) we apply systematic segregation indices that indicate the deviation from random segregation rather than from evenness. For this approach, we need to de-7 The modified measure is also sensitive to the degree of spatial aggregation; it does not tackle the MAUP. The modified index has potentially more policy relevance because desegregation via changes in the housing stock of areas might be harder to achieve than changes in the ethnic composition of areas. See Van Mourik et al. (1989) for a corresponding argument with respect to occupation segregation. termine the level of segregation under random allocation. As the city population is large relative to the population in each grid in our data set, we can use samples from a binomial distribution to generate random allocations: where p g D .P g =P / is the share of the foreign population in the city, P a is the grid size and P ga is the number of foreigners in the grid cell. Assuming a binomial density function, the share of foreigners will be distributed across grids with mean p g and variance p ga 1 p ga =P a . Based on 100 replications we calculate averages that give us the segregation measure under random allocation M rand with M c being the traditional segregation measure. We apply this decomposition approach to the dissimilarity index, the adjusted index and the isolation index. Finally, we consider the significance of spatial clustering. Both, the dissimilarity and isolation index provide limited information on the spatial distribution of demographic groups in a city because they ignore that areas showing a similar composition might cluster in space. Fig. 2 visualizes this problem for the distribution of two groups in 16 grid cells (for a similar presentation see Arribas-Bel et al. 2016). The aspatial indices detect differences between the random pattern on the left and the segregated patterns in the middle and on the right. However, differences between the two segregated patterns in the middle and on the right-hand side are not identified because the clustering of similar tracts in the right part of Fig. 2 is ignored. Fig. 2 illustrates that a given level of ethnic segregation as indicated by standard in -Fig. 2 Segregation and spatial clustering K dices can be associated with quite different spatial distributions of groups. Common measures of segregation neglect the spatial patterning of separation (Wong 1997;Harris 2014).
A measure that provides information on the spatial clustering is Moran's I. Nijkamp and Poot (2015) propose the following measure where the share of an area in the total population of group g in city c is compared with a uniform distribution .1=A/: Neighbourhood of the areas a and s is determined by the spatial weights w as and can be defined in various ways. 8 Thus, the measure points to the spatial correlation of deviations from a uniform distribution for a specific ethnic group. However, we might also use the distribution of the total population excluding the group under consideration as reference. This corresponds with the dissimilarity index in Eq. (1): .P a P ga/ .P P g / Ã 2 8a 2 c (7) A positive and significant statistic points to a clustering of ethnic segregation across neighbouring grids.
In order to investigate whether the different measurement issues affect the findings of previous studies, we compare our results with the evidence for German cities summarized in Helbig and Jähnen (2018). German cities seem to be well suited for an analysis of the robustness of a cross-sectional comparison of segregation levels because urban areas show rather distinct migration histories that, in turn, might cause a significant variation in the degree of ethnic segregation.

Results
Table 1 presents summary statistics for some basic city characteristics and the applied segregation measures. Altogether, the level of segregation in German cities is low to moderate as a dissimilarity index of less than 0.3 is usually considered low. 9 This also applies to the second dimension of segregation, i.e. exposure. The low isolation index indicates that the average share of foreigners in all grids is rather close to a level that corresponds with a distribution of the migrants across areas in 8 We use two definitions of neighbourhood: either binary contiguity, i.e. neighbouring grids have a common border, or distance based with a cut-off distance of two kilometres. 9 A dissimilarity index between 0.3 and 0.6 is considered moderate for urban areas in the U.S. and an isolation index above 0.3 as indicating a ghetto in the city (Cutler et al. 1999). The default of the grid size is 500 × 500 meters K proportion to the total population. 10 However, there is also a significant variation in ethnic segregation across cities as indicated by the standard deviation and the spread. Moreover, the mean redistribution required to reach evenness differs considerably between the traditional and the modified dissimilarity index. The level of unevenness indicated by the modified measure is much lower with on average 6% of the city population that would have to relocate to achieve an even distribution of foreigners and natives across grid cells. This is in sharp contrast to a share of 30% suggested by the traditional dissimilarity index. Thus, the assumptions regarding the reallocation of population across neighbourhoods that the use of different segregation measures involves matter a lot for the indicated level of segregation.
The results summarized in Helbig and Jähnen (2018) point to a similar level of unevenness. The deviation amounts to merely 5 percentage points if we use our default grid size (500 meters). Of course, this comparison might suffer from the MAUP as the average size of areas likely differs across the studies. If we use the number of neighbourhoods in the city as an indication of the mean area size, we should chose larger grid cells as a reference. The average number of areas amounts to almost 64 in Helbig and Jähnen (2018), while our default implies 459. Thus a grid size of 1000 (115 areas) or 2000 meters (29 areas) is more appropriate as reference. This comparison points to moderate effects of the MAUP and, thus, previous studies provide reasonable approximations of the average level of ethnic segregation in German cities. The findings also confirm earlier studies pointing to an increasing level of segregation as the spatial resolution rises (e.g. Wong 1997;2003). Sleutjes et al. (2018) compare findings on ethnic segregation for administrative units and an egocentric measurement for the Amsterdam Metropolitan area. They conclude that applying administrative districts likely results in a downward bias of exposure at a low spatial scale and an upward bias at larger scale levels.
Controlling for random segregation, all measures decrease in size, but still point to a significant segregation. This is in line with corresponding evidence on systematic segregation in Germany provided by Glitz (2014). We only consider our default grid size in this context. The average dissimilarity index declines by 18% if we account for unevenness under random allocation. The difference between the standard and the systematic measure amounts to 21% for the modified segregation index and 8% for isolation. We use citizenship instead of the migration background to operationalize the ethnic background. This implies an underestimation of the size of the minority group. The differences between the traditional and systematic measures are therefore likely even smaller if we could use the migration background instead of nationality to define the minority group. Carrington and Troske (1997) show that segregation under random allocation tends to increase with declining minority share.
Furthermore, the findings suggest that standard aspatial approaches might underrate the degree of segregation to some extent because they ignore the spatial clustering of ethnic groups. The Moran's I tests on spatial autocorrelation are highly significant at conventional levels. The results point to an important spatial dependence of dissimilarity for 82 out of 83 cities across all measures summarized in Table 1. The clustering is more pronounced if we consider deviations from a uniform distribution (see Eq. 6) as compared to using the rest of the population as a reference. 11 Altogether, the considered measurement problems do not seem to severely affect the average degree of segregation in German cities indicated by the applied indicators. Using administrative units instead of grid cells and varying the size of the spatial units gives rise to moderate changes. This also applies to differences between standard measures and indicators that refer to systematic segregation. All results point to an average level of segregation that is modest at most. However, the implied amount of redistribution is sensitive to the assumptions regarding reallocation of population across neighbourhoods that the use of different measures involves. The amount of reallocation decreases by a factor of 5 if we use the adjusted instead of the standard dissimilarity index. Johnston et al. (2007) note that in metropolitan areas, in which members of an ethnic group are very unevenly distributed across urban districts, they also tend to be more isolated from members of other groups. We might as well expect a significant spatial autocorrelation, i.e. districts in which a specific group dominates the population likely cluster in space. Therefore, we investigate whether there is an overlap of the different dimensions of segregation considered here. However, the correlation analysis summarized in Table 2 suggests that the ranking of cities significantly differs for some segregation measures. This applies in particular to the standard and adjusted dissimilarity index. In fact, the results of the two indicators show a weak negative correlation (not significant at the 5%-level). The corresponding scatterplot (Fig. 3) shows that the disparity between the two measures is mainly caused by specific groups of cities. We use mean values of the indices, indicated by the lines, Comparing the results for different spatial weights, there is some indication that the similarity of grids decreases with increasing distance. Using a cut-off distance of two kilometres instead of binary contiguity results in a significant decline in the average level of Moran's I. The default of the grid size is 500 × 500 meters to illustrate the differences in the ranking. In particular, the relative position of urban areas in East Germany changes. While they are highly segregated according to the traditional dissimilarity index, the adjusted measure suggests that East German cities are marked by very low levels of ethnic segregation. Moreover, we detect a significant variation for West German cities with a high share of foreign population, such as Munich and Frankfurt. In contrast, there is no important correlation between city size and the position in Fig. 3, as indicated by the size of the circles. The percentage of the foreign population and the chosen segregation measure clearly influence the indicated redistribution needs. The dissimilarity index tends to overrate the relative degree of residential segregation in East German cities, because the percentage of the foreign population is rather low in these areas (and vice versa for the aforementioned West German cities). The share of the foreign (native) population matters in this context because the modified index is equal to the standard index times a scaling factor that declines as group size becomes more unequal (see Eq. 2). However, there are also regions which show an above (or below) average segregation irrespective of the applied measures. This is true in particular for Berlin and cities located in the old industrial Ruhr area such as Dortmund. 12 The results in Table 2 also show that the correlation between the dissimilarity and the isolation index is moderate (0.37), while the adjusted dissimilarity measures seem to indicate evenness as well as exposure (correlation with isolation index: 0.81). The MAUP and ignoring random unevenness gives rise to minor changes in the city ranking. We find a close match among segregation measures for different grid sizes and with indicators that account for random segregation. Moreover, our findings closely resemble the results for German cities in Helbig and Jähnen (2018). All correlation coefficients are close to or larger than 0.9. Altogether, important differences in the ranking of the cities are rather the exception than the rule. It seems that the measurement issues considered in this analysis do not severely affect the ranking of the regions. The only exception refers to the most common segregation measure, the traditional dissimilarity index and its correlation with the adjusted index and the isolation index. 13 Moreover, we detect a significant positive correlation between the dissimilarity index and Moran's I-irrespective of the applied spatial weight matrix. This result indicates that the disparities in segregation levels of cities likely increase if we take into account the spatial clustering of ethnic groups. Cities like Dortmund and Berlin, which show the highest levels of segregation in Germany, are also characterised by a strong clustering of foreigners and natives, i.e. the demographic composition of neighbouring grid cells is rather similar in these cities.
The variation in the city ranking across distinct dimensions and measures of residential segregation seems to be linked to different migration histories of the cities, partly caused by the division of the country after World War II. They gave rise to significant differences in the composition of the foreign workforce across cities. Immigration to West Germany was dominated by the recruitment of guest workers during the labour shortage in the 1950s and 1960s. The migration history of East German cities since the 1950s differs significantly from immigration to urban areas in West Germany. Großmann et al. (2015) note that most foreign workers from the former Soviet Union and other socialist countries had to leave Germany after 1989. Thus, the majority of migrants who reside in East German cities today immigrated after the fall of the Berlin wall (see also Buch et al. 2018).
Different migration histories in both parts of the country have long-lasting effects. For instance, the nationality of the largest ethnic group still systematically differs across groups of cities. Turks are the largest foreign group in almost all West German cities. 14 In East German cities (apart from Berlin) Ukrainian, Russian and Vietnamese form the largest groups. This calls for a more detailed analysis of segregation that also considers differences in the level of segregation across various migrant groups. Many studies on ethnic segregation examine the spatial distribution of specific ethnic groups. For instance, Hårsman (2006) and Friedrichs (1998) analyse residential segregation of Turks in Sweden and Germany, respectively. Maré However, measurement problems become more prominent when the size of the considered minority group declines. We therefore examine whether differences between distinct measures tend to increase if we analyse ethnic segregation of smaller population groups. Evidence provided by Carrington and Troske (1997) indicates that in particular random segregation may then become an issue. Our findings confirm this hypothesis and their results. In order to evaluate the effect of a declining size of the minority group, we investigate the difference between the traditional dissimilarity index and the corresponding systematic measure for different nationalities. Fig. 4 summarizes the results for Berlin and underlines that the necessity to control for random segregation clearly increases as the size of the minority group declines, as indicated by the size of the circles. The difference between the traditional measure and the systematic index, i.e. the distance from the 45-degree line, continuously increases as groups become smaller. More precisely, the standard dissimilarity index suggests that the smallest groups hardly ever co-locate with other groups in specific areas in Berlin. Decomposing the index into a systematic and random part, the measure reveals that this finding is almost completely driven by random segregation.
These discrepancies may also adversely affect the ranking of regions since the size of ethnic groups differs significantly across urban areas as discussed above with focus on East and West German cities. In order to check whether the size of the minority group has an important impact on the cross-sectional comparison of ethnic segregation, we investigate two subgroups in more detail. Table 3 shows the correlation between different segregation measures for three minority groups that differ in size. We consider the spatial distribution of all foreign workers (as before), all nationalities of former recruitment countries (i.e. guest workers) and Turks. 15 The results indicate that differences between the measures and implications for the ranking of cities tend to increase as the group size declines. While we observe a positive correlation between the traditional dissimilarity index and the isolation index for all foreigners, there is no statistically significant relationship for the two subgroups. The traditional and adjusted dissimilarity index increasingly produce contrary ranking of cities as the considered minority group becomes smaller. Moreover, the correlation between the traditional and systematic measures declines for the recruitment nationalities and the Turks. The adjusted dissimilarity index is a noteworthy exception and produces rather robust results since it includes a scaling factor that compensates for a declining size of the minority group.

Conclusions
All measures applied to investigate ethnic segregation consistently point to a low to moderate level of segregation in German cities. The results also unambiguously indicate a significant variation across cities. These findings emerge irrespective of the applied segregation index and considered measurement issues as long as the segregation between all migrants and natives is concerned. Ignoring random segregation results in a modest upward bias of the mean segregation level. In line with previous evidence, the measured level of segregation increases as spatial resolution rises. We also detect an important spatial clustering of ethnic groups and a positive correlation between segregation level and spatial clustering at the city level. The latter implies that we will underestimate the degree of segregation in urban areas and its variation if we neglect spatial autocorrelation. However, earlier studies of ethnic segregation, which rely on data for administrative spatial units and mainly apply the dissimilarity index, provide reasonable information on the average degree of segregation in urban areas in Germany if they refer to the entire migrant population. The MAUP and random segregation do not significantly affect the ranking of the cities either. The ranking list is, however, sensitive to the dimension of segregation and to the assumptions regarding reallocation of population across neighbourhoods that the use of different measures involves. In particular, the results for the most common segregation measure, the traditional dissimilarity index, show only a low correlation with the modified dissimilarity index and the isolation index. This is due to specific groups of cities whose relative position changes significantly across indices and which either have a relatively low (East German cities) or high population share of foreigners (specific West German cities). While East German cities are highly segregated according to the results of the traditional dissimilarity index, the adjusted measure and the isolation index suggests that these urban areas are marked by low levels of ethnic segregation. Thus, considering solely one dimension of segregation and focusing on the dissimilarity index might not provide reliable evidence on regional disparities in ethnic segregation.
The differences between East and West German cities point to the importance of the size of the minority group when it comes to measurement issues. Often segregation studies examine the spatial distribution of specific ethnic groups because some minorities seem to be more segregated than other groups. For instance, the Turks and guest worker nationalities have been the focus of different European studies (e.g. Schönwälder and Söhn 2009;Hårsman 2006). Our results clearly indicate that the severity of measurement problems significantly increases with declining size of the minority group under consideration. If the size of specific ethnic groups differs across regions, this will also affect the ranking of regions. Thus, the necessity to properly account for random segregation, the MAUP and spatial autocorrelation increases as segregation analyses become more detailed and consider specific (small) minority groups.
However, the sensitivity to measurement problems clearly differs across popular segregation indices. Our findings indicate that the adjusted dissimilarity index produces rather robust results and seems to capture unevenness and exposure. It is noteworthy that our results raise rather strong concerns over the use of the most common measure, the traditional dissimilarity index. The index might be useful in order to compare findings with evidence provided by previous studies that often make use of the dissimilarity index only. However, it is advisable to consider alternative measures in order to examine the robustness of the results. As previous studies on residential segregation of specific ethnic groups tend to rely on the traditional dissimilarity index and ignore random segregation it seems advisable to reassess these findings.

Fig. 5
Correlation between results based on working population and findings by Helbig and Jähnen (2018) for population data