An open-source framework for non-spatial and spatial segregation measures: the PySAL segregation module

Cortes, Renan Xavier; Rey, Sergio; Knaap, Elijah; Wolf, Levi John

doi:10.1007/s42001-019-00059-3

An open-source framework for non-spatial and spatial segregation measures: the PySAL segregation module

Research Article
Published: 23 November 2019

Volume 3, pages 135–166, (2020)
Cite this article

Journal of Computational Social Science Aims and scope Submit manuscript

Renan Xavier Cortes ORCID: orcid.org/0000-0002-1889-5282¹,
Sergio Rey¹,
Elijah Knaap¹ &
…
Levi John Wolf²

841 Accesses
10 Citations
12 Altmetric
Explore all metrics

Abstract

In human geography and the urban social sciences, the segregation literature typically engages with five conceptual dimensions along which a given society may be considered segregated: evenness, isolation, clustering, concentration and centralization (all of which can incorporate or omit spatial context). Over the last several decades, dozens of segregation indices have been proposed and studied in the literature, each of which is designed to focus on the nuances of a particular dimension, or correct an oversight in earlier work. Despite their increasing proliferation, however, few of these indices remain used in practice beyond their original conception, due in part to complex formulae and data requirements, particularly for indices that incorporate spatial context. Furthermore, existing segregation software typically fails to provide inferential frameworks for either single-value or comparative hypothesis testing. To fill this gap, we develop an open-source Python package designed as a submodule for the Python Spatial Analysis Library, PySAL. This new module tackles the problem of segregation point estimation for a wide variety of spatial and aspatial segregation indices, while providing a computationally based hypothesis testing framework that relies on simulations under the null hypothesis. We illustrate the use of this new library using tract-level census data in two American cities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A typology of U.S. metropolises by rent burden and its major drivers

Article 06 June 2023

Understanding the Relationship between Urban Public Space and Social Cohesion: A Systematic Review

Article Open access 13 March 2024

The accuracy of crime statistics: assessing the impact of police data bias on geographic crime analysis

Article Open access 26 March 2021

Notes

For a literature review on segregation, we refer to Ref. [44]. We also refer to Refs. [10] and [18] as important literature in segregation.
For application examples, see [8, 14, 26, 27, 46].
More recently, Ref. [12] addressed this problem assuming a nonparametric binomial mixture of the frequencies.
Table 2 of [3] cites other options of software that also put effort to calculate these indices such as Refs. [36] and [50], but not as open-source.
In the original paper, they consider 43 different indices, due to three Atkinson indices versions. However, these indices only differ in terms of the value of the parameter b; therefore, we consider this index only once.
Most notably shapefiles are limited to ten character column names and they are difficult to transport across computing environments because the specification is actually a minimum of four files, not a single file as the name would suggest.
One of the most prominent is the indices issues presented in Ref. [49] discussed in the bottom of page 6 of Ref. [47]. During the construction of the present module, the same problems were identified and the default approach of these indices follows actually the latter study for this Python package.
In terms of software, so far, we are unaware of any that performs inference for comparison between them.
This last case is unusual, but our framework permits any of these combinations, as presented in Sect. ??.
Available at https://github.com/pysal/segregation.
More recently, some other measures were added to SM, but we conducted the current work with the original 25.
In addition, the module has a function/class named Compute_All_Segregation that performs point estimation of several segregation measures at once.
It is worth mentioning, that using a geopandas GeoDataFrame for the non-spatial indices is also valid since it “behaves” as a usual pandas dataframe.
Assuming that $n_{ij}$ is the population of unit i of group j, this approach assumes that the distribution of people from each j group is a multinomial distribution with probabilities given by $\frac{\sum _{j}n_{ij}}{\sum _{i}\sum _{j}n_{ij}}=\frac{n_{i.}}{n_{..}}$.
We are aware that for some measures, some approaches would not be appropriate, but we chose to allow these combinations, allowing our framework to remain as generic as possible. For example, the Modified Dissimilarity (Dct) and Gini (Gct), rely exactly on the distance between evenness through sampling which, therefore, the "evenness" value for null_approach would not be the most appropriate for these indices.
We thank a reviewer for drawing attention to this point in the manuscript.
There is also a statistic attribute to access the original point estimation of the measure.
Note that in this case, each measure has to be the same SM class as it would not make much sense to compare, for example, a Gini Index with a Delta (DEL) Index.
We refer the word composition to the group of interest frequency of each unit. For example, if a unit has total population of 50 and 5 people belonging to group A, the group A composition of this unit is 10%.
The details of the construction of these counterfactual values are presented in Appendix B.
We also noticed that for most of the indices, specially the spatial ones, SM was much faster to estimate than the implementation of Ref. [47].
We used the total population of 100,000 and generated a random composition for each unit given from a Uniform distribution between 0 and 1.
The indices were fitted used the default values for input. Although this can be a source for difference in the values, we highlight that these default values are roughly comparable since all indices that rely on simulations (Dct, Gct, and Dbc) have the same value of 500 for the iterations and indices that rely on integration (R and SPP) have the same number of thresholds for integral approximation of 1000. The index Ddc has a degree of tolerance in the optimization of $10^{-5}$.
The values marked with * are virtually the same although OasisR has a mispecification in $d_{ii}$ that does not follow [24]. This difference can be checked in https://github.com/cran/OasisR/pull/1/commits/cc3681dae96188663230cf140d0cf41fd90e45cd.
Composed by five counties: New York County, Bronx County, Kings County, Queens County and Richmond County.
Both regions are similar in terms of number of spatial units, as Los Angeles County has 2346 census tracts in 2010 and New York City has 2168.
Once again, all simulation were run using the default values of the input parameters and 500 iterations in parallel with 6 cores in a Jupyter Notebook [22] using an Intel (R) Core (TM) i7-8750H CPU with 2.21 GHz and 16 GB of RAM. It was necessary approximately 34.7 h to run all application results here presented.
This approach does not apply to measures that do not take spatial context into consideration since each value for the simulations would be the same along the permutations.
${H_0:} \mathrm{Los Angeles}\ {\mathrm{segregation}_{2010}}\ - \mathrm{Los Angeles}\ {\mathrm{segregation}_{2000}} = 0. $
With the caveat that the Exposure is inversely proportional of the segregation and, thus, it is located on the right-tail of the distribution under null hypothesis.
The p value of ACO was $\approx $ 0.74 and of RCO was $\approx $ 0.816.
${H_0:} \mathrm{Los}\ \mathrm{Angeles}\ \mathrm{segregation} - \mathrm{New}\ \mathrm{York}\ \mathrm{segregation} = 0$.
For the xPy and DDxPy, it presented lower values, but the interpretation is the same.
However, an unexpected result arose from the fact that for the Ddc Index Los Angeles was, significantly, more segregated.
https://github.com/UDST.
This table does not reflect necessarily the original/pioneer paper of each measure, but rather the related literature of the formulas presented in this Appendix.
We considered to include the mixture of betas approach of Ref. [35] for the D, G and H indices, as the author kindly shared the original code. However, due to convergence problems, we chose not to include it in the current version of SM.

References

Allen, J. P., & Turner, E. (2012). Black-White and Hispanic-White segregation in US counties. The Professional Geographer, 64(4), 503–520.
Article Google Scholar
Allen, R., Burgess, S., Davidson, R., & Windmeijer, F. (2015). More reliable inference for the dissimilarity index of segregation. The Econometrics Journal, 18(1), 40–66.
Article Google Scholar
Apparicio, P., Martori, J. C., Pearson, A. L., Fournier, É., & Apparicio, D. (2014). An open-source software for calculating indices of urban residential segregation. Social Science Computer Review, 32(1), 117–128.
Article Google Scholar
Boisso, D., Hayes, K., Hirschberg, J., & Silber, J. (1994). Occupational segregation in the multidimensional case: Decomposition and tests of significance. Journal of Econometrics, 61(1), 161–171.
Article Google Scholar
Brown, L. A., & Chung, S. Y. (2006). Spatial segregation, segregation indices and the geographical perspective. Population, Space and Place, 12(2), 125–143.
Article Google Scholar
Carrillo, P. E., & Rothbaum, J. L. (2016). Counterfactual spatial distributions. Journal of Regional Science, 56(5), 868–894.
Article Google Scholar
Carrington, W. J., & Troske, K. R. (1997). On measuring segregation in samples with small units. Journal of Business & Economic Statistics, 15(4), 402–409.
Google Scholar
Carrington, W. J., & Troske, K. R. (1998). Interfirm segregation and the Black/White wage gap. Journal of Labor Economics, 16(2), 231–260.
Article Google Scholar
Clark, W. A., & Östh, J. (2018). Measuring isolation across space and over time with new tools: Evidence from Californian metropolitan regions. Environment and Planning B: Urban Analytics and City Science, 45(6), 1038–1054.
Google Scholar
Cowgill, D. O., & Cowgill, M. S. (1951). An index of segregation based on block statistics. American Sociological Review, 16(6), 825–831.
Article Google Scholar
Devroye, L. (1986). Sample-based non-uniform random variate generation. In Proceedings of the 18th conference on winter simulationACM (pp. 260–265).
d’Haultfoeuille, X., & Rathelot, R. (2017). Measuring segregation on small units: A partial identification analysis. Quantitative Economics, 8(1), 39–73.
Article Google Scholar
Duncan, O. D., & Duncan, B. (1955). A methodological analysis of segregation indexes. American Sociological Review, 20(2), 210–217.
Article Google Scholar
Hellerstein, J. K., & Neumark, D. (2008). Workplace segregation in the united states: Race, ethnicity, and skill. The Review of Economics and Statistics, 90(3), 459–477.
Article Google Scholar
Hong, S. Y., O’Sullivan, D., & Sadahiro, Y. (2014). Implementing spatial segregation measures in R. PloS One, 9(11), e113767.
Article Google Scholar
Hong, S. Y., & Sadahiro, Y. (2014). Measuring geographic segregation: A graph-based approach. Journal of Geographical Systems, 16(2), 211–231.
Article Google Scholar
Hunter, J. D. (2007). Matplotlib: A 2d graphics environment. Computing In Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55.
Article Google Scholar
James, D. R., & Taeuber, K. E. (1985). Measures of segregation. Sociological Methodology, 15, 1–32.
Article Google Scholar
Johnston, R., Poulsen, M., & Forrest, J. (2007). Ethnic and racial segregation in us metropolitan areas, 1980–2000: The dimensions of segregation revisited. Urban Affairs Review, 42(4), 479–504.
Article Google Scholar
Jones, K., Johnston, R., Manley, D., Owen, D., & Charlton, C. (2015). Ethnic residential segregation: A multilevel, multigroup, multiscale approach exemplified by London in 2011. Demography, 52(6), 1995–2019.
Article Google Scholar
Jordahl, K. (2014). Geopandas: Python tools for geographic data. https://github.com/geopandas/geopandas. Accessed 3 Apr 2019.
Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B. E., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J. B., Grout, J., & Corlay, S., et al. (2016). Jupyter notebooks-a publishing format for reproducible computational workflows. In ELPUB (pp. 87–90). https://jupyter.org/. Accessed 3 Apr 2019.
Lee, D., Minton, J., & Pryce, G. (2015). Bayesian inference for the dissimilarity index in the presence of spatial autocorrelation. Spatial Statistics, 11, 81–95.
Article Google Scholar
Massey, D. S., & Denton, N. A. (1988). The dimensions of residential segregation. Social Forces, 67(2), 281–315.
Article Google Scholar
Massey, D. S., & Denton, N. A. (1989). Hypersegregation in us metropolitan areas: Black and hispanic segregation along five dimensions. Demography, 26(3), 373–391.
Article Google Scholar
Massey, D. S., & Denton, N. A. (1993). American apartheid: Segregation and the making of the underclass. Cambridge: Harvard University Press.
Google Scholar
Massey, D. S., & Tannen, J. (2015). A research note on trends in black hypersegregation. Demography, 52(3), 1025–1034.
Article Google Scholar
McKinney, W. (2010). Data Structures for Statistical Computing in Python. In S. van der Walt, J. Millman (Ed.), Proceedings of the 9th Python in Science Conference (pp. 51–56).
Morgan, B. S. (1983). A distance-decay based interaction index to measure residential segregation. Area, 15(3), 211–217.
Google Scholar
Morrill, R. L. (1991). On the measure of geographic segregation. Geography Research Forum, 11, 25–36.
Google Scholar
Napierala, J., & Denton, N. (2017). Measuring residential segregation with the ACS: How the margin of error affects the dissimilarity index. Demography, 54(1), 285–309.
Article Google Scholar
Park, R. E. (1926). The urban community as a spatial pattern and a moral order. In Urban social segregation (pp. 21–31).
R Development Core Team (2008) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0. http://www.R-project.org. Accessed 3 Apr 2019.
Ransom, M. R. (2000). Sampling distributions of segregation indexes. Sociological Methods & Research, 28(4), 454–475.
Article Google Scholar
Rathelot, R. (2012). Measuring segregation when units are small: A parametric approach. Journal of Business & Economic Statistics, 30(4), 546–553.
Article Google Scholar
Reardon, S. F., & Townsend, J. B. (1999). SEG: Stata module to compute multiple-group diversity and segregation indices. Statistical Software Components, Boston College Department of Economics. https://ideas.repec.org/c/boc/bocode/s375001.html. Accessed 3 Apr 2019.
Reardon, S. F., & Firebaugh, G. (2002). Measures of multigroup segregation. Sociological Methodology, 32(1), 33–67.
Article Google Scholar
Reardon, S. F., & O’Sullivan, D. (2004). Measures of spatial segregation. Sociological Methodology, 34(1), 121–162.
Article Google Scholar
Rey, S. J. (2004). Spatial analysis of regional income inequality. Spatially Integrated Social Science, 1, 280–299.
Google Scholar
Rey, S. J., & Anselin, L. (2010). PySAL: A Python library of spatial analytical methods. In Handbook of applied spatial analysis (pp. 175–193). Springer.
Rey, S. J., & Sastré-Gutiérrez, M. L. (2010). Interregional inequality dynamics in Mexico. Spatial Economic Analysis, 5(3), 277–298.
Article Google Scholar
Roberto, E. (2018). The spatial proximity and connectivity method for measuring and analyzing residential segregation. Sociological Methodology, 48(1), 182–224.
Article Google Scholar
Rossum, G. (1995). Python reference manual. Technical report. The Netherlands: Amsterdam.
Royuela, V., & Vargas, M., et al. (2010). Residential segregation: A literature review. Technical report.
Shapley, L. S. (1953). A value for n-person games. Contributions to the Theory of Games, 2(28), 307–317.
Google Scholar
Söderström, M., & Uusitalo, R. (2010). School choice and segregation: Evidence from an admission reform. Scandinavian Journal of Economics, 112(1), 55–76.
Article Google Scholar
Tivadar, M. (2019). Oasisr: An R package to bring some order to the world of segregation measurement. Journal of Statistical Software, 89(1), 1–39.
Google Scholar
Waskom, M., Botvinnik, O., O’Kane, D., Hobson, P., Lukauskas, S., Gemperline, D. C., Augspurger, T., Halchenko, Y., Cole, J. B., Warmenhoven, J., de Ruiter, J., Pye, C., Hoyer, S., Vanderplas, J., Villalba, S., Kunter, G., Quintero, E., Bachant, P., Martin, M., Meyer, K., Miles, A., Ram, Y., Yarkoni, T., Williams, M. L., Evans, C., Fitzgerald, C., Brian, Fonnesbeck, C., Lee, A., & Qalieh, A. (2017). mwaskom/seaborn: v0.8.1 (september 2017). https://doi.org/10.5281/zenodo.883859.
Wong, D. W. (1993). Spatial indices of segregation. Urban Studies, 30(3), 559–572.
Article Google Scholar
Wong, D. W. (2003). Implementing spatial segregation measures in GIS. Computers, Environment and Urban Systems, 27(1), 53–70.
Article Google Scholar

Download references

Acknowledgements

We are grateful for the support of National Science Foundation (NSF) (Award 1831615) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) foundation (Process 88881.170553/2018-01).

Author information

Authors and Affiliations

Center for Geospatial Sciences, University of California, Riverside, USA
Renan Xavier Cortes, Sergio Rey & Elijah Knaap
School of Geographical Sciences, University of Bristol, Bristol, UK
Levi John Wolf

Authors

Renan Xavier Cortes
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Rey
View author publications
You can also search for this author in PubMed Google Scholar
Elijah Knaap
View author publications
You can also search for this author in PubMed Google Scholar
Levi John Wolf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renan Xavier Cortes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A: Point estimation details

Here, we present and explain each formula for the segregation measures presented in Table 1 of Section 2.1. The respective literature used for each measure can be found in Table 4^{Footnote 36}$^,$^{Footnote 37} in addition with the respective dimension.

For consistency of notation, we assume that $n_{ij}$ is the population of unit $i \in \{1,\ldots , I\}$ of group $j \in \{x, y\}$, also $\sum _{j}n_{ij} = n_{i.}$, $\sum _{i}n_{ij} = n_{.j}$, $\sum _{i}\sum _{j}n_{ij} = n_{..}$, ${\tilde{s}}_{ij} = \frac{n_{ij}}{n_{i.}}$, ${\hat{s}}_{ij} = \frac{n_{ij}}{n_{.j}}$. The segregation indices can be build for any group j of the data.

The Dissimilarity Index (D) is given by:

$$\begin{aligned} D=\sum _{i=1}^{I}\frac{n_{i.}\mid {\tilde{s}}_{ij}-\frac{n_{.j}}{n_{..}}\mid }{2n_{..}\frac{n_{.j}}{n_{..}}\left( 1-\frac{n_{.j}}{n_{..}} \right) }. \end{aligned}$$

(2)

The spatial D (SD) is given by:

$$\begin{aligned} SD = D-\frac{\sum _{i_1=1}^{I}\sum _{i_2=1}^{I}\left| {\tilde{s}}_{ij}^{i_1}-{\tilde{s}}_{ij}^{i_2} \right| c_{i_1i_2}}{\sum _{i_1=1}^{I}\sum _{i_2=1}^{I}c_{i_1i_2}}, \end{aligned}$$

(3)

where ${\tilde{s}}_{ij}^{i_1}$ and ${\tilde{s}}_{ij}^{i_2}$ are the proportions of the minority population in the units $i_1$ and $i_2$, respectively and where $c_{i_1i_2}$ denotes an element at $(i_1,i_2)$ in a matrix C, which becomes one only if $i_1$ and $i_2$ are considered neighbors.

The boundary spatial D (BSD) is given by:

$$\begin{aligned} BSD = D - \frac{1}{2}{\sum _{i_1=1}^{I}\sum _{i_2=1}^{I}w_{i_1i_2} \left| {\tilde{s}}_{ij}^{i_1} - {\tilde{s}}_{ij}^{i_2} \right| }, \end{aligned}$$

(4)

where

$$\begin{aligned} w_{i_1i_2} = \frac{cb_{i_1i_2}}{\sum _{i_2=1}^{I}d_{i_1i_2}}, \end{aligned}$$

where ${\tilde{s}}_{ij}^{i_1}$ and ${\tilde{s}}_{ij}^{i_2}$ are the proportions of the minority population in the units $i_1$ and $i_2$, respectively, and $cb_{i_1i_2}$ is the length of the common boundary of areal units $i_1$ and $i_2$.

The perimeter/area ratio spatial D (PARD) is a Spatial Dissimilarity Index that takes into consideration the perimeter and the area of each unit by adding a specific multiplicative term in the second term of BSD (the spatial effect):

$$\begin{aligned} \frac{\frac{1}{2}\left[ \left( \frac{P_i}{A_i} \right) +\left( \frac{P_j}{A_j} \right) \right] }{\mathrm{MAX}\left( \frac{P}{A} \right) }, \end{aligned}$$

(5)

where $P_i$ and $A_i$ are the perimeter and area of unit i, respectively and $\mathrm{MAX}(P{/}A)$ is the maximum perimeter–area ratio or the minimum compactness of an areal unit found in the study region.

The Gini coefficient (G) is given by:

$$\begin{aligned} G=\sum _{i_1=1}^{I}\sum _{i_2=1}^{I}\frac{n_{i_1.}n_{i_2.}\mid {\tilde{s}}_{ij}^{i_1}-{\tilde{s}}_{ij}^{i_2}\mid }{2n_{..}^2\frac{n_{.j}}{n_{..}}\left( 1-\frac{n_{.j}}{n_{..}} \right) }. \end{aligned}$$

(6)

The global entropy (E) is given by:

$$\begin{aligned} E = \frac{n_{.j}}{n_{..}} \ \mathrm{log}\left( \frac{1}{\frac{n_{.j}}{n_{..}}} \right) +\left( 1-\frac{n_{.j}}{n_{..}} \right) {\text {log}}\left( \frac{1}{1-\frac{n_{.j}}{n_{..}}} \right) , \end{aligned}$$

(7)

while the unit’s entropy is analogously:

$$\begin{aligned} E_i = {\tilde{s}}_{ij} \ {\text{ log }}\left( \frac{1}{{\tilde{s}}_{ij}} \right) +\left( 1-{\tilde{s}}_{ij} \right) {\text{ log }}\left( \frac{1}{1-{\tilde{s}}_{ij}} \right) . \end{aligned}$$

(8)

Therefore, the Entropy Index (H) is given by:

$$\begin{aligned} H = \sum _{i=1}^{I}\frac{n_{i.}\left( E-E_i \right) }{En_{..}} \end{aligned}$$

(9)

The Atkinson Index (A) is given by:

$$\begin{aligned} A = 1 - \frac{\frac{n_{.j}}{n_{..}}}{1-\frac{n_{.j}}{n_{..}}}\left| \sum _{i=1}^{I}\left[ \frac{\left( 1-{\tilde{s}}_{ij} \right) ^{1-b}{\tilde{s}}_{ij}^bt_i}{\frac{n_{.j}}{n_{..}}n_{..}} \right] \right| ^{\frac{1}{1-b}}, \end{aligned}$$

(10)

where b is a shape parameter that determines how to weight the increments to segregation contributed by different portions of the Lorenz curve.

The Concentration Profile (R) measure is discussed in Ref. [16] and tries to inspect the evenness aspect of segregation. The threshold proportion t is given by:

$$\begin{aligned} \upsilon _t = \frac{\sum _{i=1}^{I}n_{ij}g(t,i)}{\sum _{i=1}^{I}n_{ij}}. \end{aligned}$$

(11)

In the equation, g(t, i) is a logical function that is defined as:

$$\begin{aligned} g(t,i) = {\left\{ \begin{array}{ll} 1 &{} \text {if }\frac{n_{ij}}{n_{i.}} \ge t \\ 0 &{} \text {otherwise} \end{array}\right. }. \end{aligned}$$

(12)

The Concentration Profile (R) is given by:

$$\begin{aligned} R=\frac{\frac{n_{.j}}{n_{..}}-\left( \int _{t=0}^{\frac{n_{.j}}{n_{..}}}\upsilon _t {\text{ d }}t - \int _{t=\frac{n_{.j}}{n_{..}}}^{1}\upsilon _t{\text{ d }}t \right) }{1-\frac{n_{.j}}{n_{..}}}. \end{aligned}$$

(13)

The SPP is similar to the Concentration Profile, but with the addition of the spatial component in the connecting function:

$$\begin{aligned} \eta _t = \frac{k^2-k}{\sum _{i_1}\sum _{1_2}\delta _{i_1i_2}}, \end{aligned}$$

(14)

where k refers to the sum of g(t, i) for a given t and $\delta _{ij}$ is the distance between $i_1$ and $i_2$. One way of determining $\delta _{i_1i_2}$ would be to use a spatial structure matrix, W. The matrix W present ones if $i_1$ and $i_2$ are contiguous and zero, otherwise. The distance $\delta _{i_1i_2}$ between $i_1$ and $i_2$ is given by is the order of how neighbors is needed to reach from $i_1$ to $i_2$. For example, two census tracts, $x_1$ and $x_2$, that do not have a common boundary but both are adjacent to the same unit, $x_3$, are second-order neighbors, so $\delta _{12}$ becomes 2. Like the Concentration Profile, if the number of thresholds used is large enough, a smooth curve, or a SPP, can be constructed by plotting and connecting $\eta _t$.

Isolation (xPx) assess how much a minority group is only exposed to the same group. In other words, how much they only interact the members of the group that they belong. Assuming $j = x$ as the minority group, the isolation of x is giving by:

$$\begin{aligned} \mathrm{xP}x=\sum _{i=1}^{I}\left( {\hat{s}}_{ix} \right) \left( {\tilde{s}}_{ix} \right) . \end{aligned}$$

(15)

The Exposure (xPy) of x is giving by

$$\begin{aligned} \mathrm{xP}y=\sum _{i=1}^{I}\left( {\hat{s}}_{iy} \right) \left( {\tilde{s}}_{iy} \right) . \end{aligned}$$

(16)

The correlation ratio (V or $\mathrm{Eta}^2$) is given by

$$\begin{aligned} V = \mathrm{Eta}^2 = \frac{\mathrm{xP}x - \frac{n_{.x}}{n_{..}}}{1 - \frac{n_{.x}}{n_{..}}}. \end{aligned}$$

(17)

The SP Index is given by:

$$\begin{aligned} \mathrm{SP} = \frac{{XP}_{xx} + {YP}_{yy}}{{TP}_{tt}}, \end{aligned}$$

(18)

where

$$\begin{aligned}&P_{xx} = \sum _{i_1=1}^{I}\sum _{i_2=1}^{I}\frac{n_{i_1x}n_{i_2x}\zeta _{i_1i_2}}{n_{.x}^2}\\&P_{yy} = \sum _{i_1=1}^{I}\sum _{i_2=1}^{I}\frac{n_{i_1y}n_{i_2y}\zeta _{i_1i_2}}{n_{.y}^2}\\&P_{tt} = \sum _{i_1=1}^{I}\sum _{i_2=1}^{I}\frac{n_{i_1.}n_{i_2.}\zeta _{i_1i_2}}{n_{..}^2}\\&\zeta _{i_1i_2} = \mathrm{exp}(-d_{i_1i_2}), \end{aligned}$$

$d_{i_1i_2}$ is a pairwise distance measure between area $i_1$ and $i_2$ and $d_{ii}$ is estimated as $d_{ii} = (\alpha a_i)^{\beta }$ where $a_i$ is the area of unit i. The default is $\alpha = 0.6$ and $\beta = 0.5$ and for the distance measure, we first extract the centroid of each unit and calculate the euclidean distance.

The RCL measure is given by:

$$\begin{aligned} \mathrm{RCL} = \frac{P_{xx}}{P_{yy}} - 1. \end{aligned}$$

(19)

The Distance Decay Isolation (DDxPx) is given by:

$$\begin{aligned} \mathrm{DDxP}x=\sum _{i_1=1}^{I}\left( {\hat{s}}_{i_1x} \right) \left( \sum _{i_2=1}^{I}P_{i_1i_2} \left( {\tilde{s}}_{i_1x}\right) \right) , \end{aligned}$$

(20)

where

$$\begin{aligned} P_{i_1i_2} = \frac{\zeta _{i_1i_2}n_{i_2.}}{\sum _{i_2=1}^{I}\zeta _{i_1i_2}n_{i_2.}} \end{aligned}$$

such that

$$\begin{aligned} \sum _{i_2=1}^{I}P_{i_1i_2} = 1, \end{aligned}$$

where $\zeta _{i_1i_2}$ is defined as before. This also could be seen as the probability of contact of members of group x to each other weighted by the inverse of distance.

The Distance Decay Exposure (DDxPy) is given by:

$$\begin{aligned} \mathrm{DDxP}y=\sum _{i_1=1}^{I}\left( {\hat{s}}_{i_1x} \right) \left( \sum _{i_2=1}^{I}P_{i_1i_2} \left( {\tilde{s}}_{i_1y}\right) \right) \end{aligned}$$

(21)

where $P_{i_1i_2}$ is defined as before.

The DEL measure is given by the following equation:

$$\begin{aligned} \mathrm{DEL} = \frac{1}{2}\sum _{i=1}^{I}\left| {\hat{s}}_{ij} - \frac{a_i}{A} \right| , \end{aligned}$$

(22)

where $a_i$ is the area of unit i and A is the total area of the given region $A = \sum _{i=1}^{I}a_i$.

The ACO Index is given by:

$$\begin{aligned} \mathrm{ACO} = 1-\frac{ \sum _{i=1}^{I}\left( \frac{n_{ij}a_i}{n_{.j}} \right) - \sum _{i=1}^{n_1}\left( \frac{n_{i.}a_i}{T_1} \right) }{ \sum _{i=n_2}^{I}\left( \frac{n_{i.}a_i}{T_2} \right) - \sum _{i=1}^{n_1}\left( \frac{n_{i.}a_i}{T_1} \right) }, \end{aligned}$$

(23)

where the units are ordered from smallest to largest in areal size. In this formula, $n_1$ is the rank of the unit where the cumulative total population equal the total minority population, $n_2$ is the rank of the unit where cumulative total population equal equal the total minority population from the largest unit down. In addition,

$$\begin{aligned} T_1 = \sum _{i=1}^{n_1}n_{i}, \end{aligned}$$

and

$$\begin{aligned} T_2 = \sum _{i=n_2}^{n}n_{i}. \end{aligned}$$

Another measure of concentration is the RCO Index:

$$\begin{aligned} \mathrm{RCO} = \frac{\frac{\sum _{i=1}^{I}\left( \frac{n_{ix}a_i}{n_{.x}} \right) }{\sum _{i=1}^{I}\left( \frac{n_{iy}a_i}{n_{.y}} \right) }-1}{\frac{\sum _{i=1}^{n_1}\left( \frac{n_{i.}a_i}{T_1} \right) }{\sum _{i=n_2}^{I}\left( \frac{n_{i.}a_i}{T_2} \right) }-1}, \end{aligned}$$

(24)

where $n_1$, $n_2$, $T_1$ and $T_2$ are defined as before.

The degree of centralization can be evaluated through the Absolute Centralization Index (ACE) or through the RCE:

$$\begin{aligned} \mathrm{ACE}= & {} \left( \sum _{i=2}^{I}X_{i-1}A_i \right) - \left( \sum _{i=2}^{I}X_{i}A_{i-1} \right) , \end{aligned}$$

(25)

$$\begin{aligned} \mathrm{RCE}= & {} \left( \sum _{i=2}^{I}X_{i-1}Y_i \right) - \left( \sum _{i=2}^{I}X_{i}Y_{i-1} \right) , \end{aligned}$$

(26)

where $A_i$ is the cumulative area proportion through unit i, $X_i$ is the cumulative frequency proportion through unit i of group x and $Y_i$ is the analogous for group y. In this measure, the area units are ordered by increasing distances from the central business district, which we assume being located in the average latitude and average longitude among all centroid.

The Dct Index based on [7] evaluates the deviation from simulated evenness. This measure is estimated by taking the mean of the classical D under several simulations under evenness from the global minority proportion.

Let $D^*$ be the average of the classical D under simulations draw assuming evenness from the global minority proportion. The value of Dct can be evaluated with the following equation:

$$\begin{aligned} \mathrm{Dct} = {\left\{ \begin{array}{ll} \frac{D-D^*}{1-D^*} &{} \text {if }D \ge D^*\vspace{6pt} \\ \frac{D-D^*}{D^*} &{} \text {if }D < D^*\end{array}\right. }. \end{aligned}$$

(27)

Similarly, the Gct based also on Ref. [7] evaluates the deviation from simulated evenness. This measure is estimated by taking the mean of the classical G under several simulations under evenness from the global minority proportion.

Let $G^*$ be the average of G under simulations draw assuming evenness from the global minority proportion. The value of Gct can be evaluated with the following equation:

$$\begin{aligned} \mathrm{Gct} = {\left\{ \begin{array}{ll} \frac{G-G^*}{1-G^*} &{} \text {if }G \ge G^* \vspace{6pt}\\ \frac{G-G^*}{G^*} &{} \text {if }G < G^* \end{array}\right. }. \end{aligned}$$

(28)

Lastly, the Bias-Corrected (Dbc) and Density-Corrected (Ddc) Dissimilarities indices are presented in Ref. [2]. The Dbc is given by:

$$\begin{aligned} D_\mathrm{bc} = 2D - {\bar{D}}_\mathrm{b}, \end{aligned}$$

(29)

where ${\bar{D}}_b$ is the average of B resampling using the observed conditional probabilities for a multinomial distribution for each group independently.

The Ddc measure is given by:

$$\begin{aligned} D_\mathrm{dc} = \frac{1}{2}\sum _{i=1}^{I}{\hat{\sigma }}_in\left( {\hat{\theta }}_i \right) , \end{aligned}$$

(30)

where

$$\begin{aligned} {\hat{\sigma }}^2_i = \frac{{\hat{s}}_{ix} (1-{\hat{s}}_{ix})}{n_{.x}} + \frac{{\hat{s}}_{iy} (1-{\hat{s}}_{iy})}{n_{.y}}, \end{aligned}$$

and $n\left( {\hat{\theta }}_i \right) $ is the $\theta _i$ that maximizes the folded normal distribution $\phi ({\hat{\theta }}_i-\theta _i) + \phi ({\hat{\theta }}_i+\theta _i)$ where

$$\begin{aligned} \hat{\theta _i} = \frac{\left| {\hat{s}}_{ix}-{\hat{s}}_{iy} \right| }{\hat{\sigma _i}}, \end{aligned}$$

and $\phi $ is the standard normal density.

Table 4 Segregation measures-related literature for PySAL segregation module point estimations

Full size table

B: Counterfactual composition details

Following the same notation of A and assuming building counterfactual values fro two different cities, we form the cumulative distribution functions (CDF) for these values taken over all the tracts in City 1: $F^{(1)}({\tilde{s}}_{i,j}^{1,t})$, and City 2: $F^{(2)}({\tilde{s}}_{i,j}^{2,t})$. To create a counterfactual distribution that imposes the attribute distribution of City 2 on the spatial structure of City 1 we take $p_{i,j}^{1,t} = F^{(1)}({\tilde{s}}_{i,j}^{1,t})$ and then generate $n_{i,j}^{1,t} |_{attr = 2} = {F^{(2)}}^{-1}(p_{i,j}^{1,t}) n_{i,.}^{1,t}$, where $attr = 2$ means that this population is calculated given the attributes of City 2. This entire process is done for all tracts of a group in City 1 and the majority group population is given by the difference $n_{i,.}^{1,t} - n_{i,j}^{1,t} |_{attr = 2}$. The populations for City 2 are generated analogously.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cortes, R.X., Rey, S., Knaap, E. et al. An open-source framework for non-spatial and spatial segregation measures: the PySAL segregation module. J Comput Soc Sc 3, 135–166 (2020). https://doi.org/10.1007/s42001-019-00059-3

Download citation

Received: 29 June 2019
Accepted: 30 October 2019
Published: 23 November 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s42001-019-00059-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An open-source framework for non-spatial and spatial segregation measures: the PySAL segregation module

Abstract

Access this article

Similar content being viewed by others

A typology of U.S. metropolises by rent burden and its major drivers

Understanding the Relationship between Urban Public Space and Social Cohesion: A Systematic Review

The accuracy of crime statistics: assessing the impact of police data bias on geographic crime analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A: Point estimation details

B: Counterfactual composition details

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An open-source framework for non-spatial and spatial segregation measures: the PySAL segregation module

Abstract

Access this article

Similar content being viewed by others

A typology of U.S. metropolises by rent burden and its major drivers

Understanding the Relationship between Urban Public Space and Social Cohesion: A Systematic Review

The accuracy of crime statistics: assessing the impact of police data bias on geographic crime analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A: Point estimation details

B: Counterfactual composition details

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation