A multilevel structural equation modelling approach to study segregation of deprivation: an application to Bolivia

The study of segregation of deprivation can provide a tool to determine the economic, social and institutional factors associated with spatial unevenness in the distribution of wealth. Segregation is linked to social exclusion, diminished opportunities for human capital development and lower access to public services. In comparison to descriptive measures of poverty segregation, a multilevel structural equation modelling approach allows us to make statistical inferences about segregation, and to assess the extent to which segregation can be explained by contextual variables. Previous research using multilevel models to analyse segregation is extended to handle a continuous latent variable, measured by multiple binary indicators. The proposed approach is used to quantify the extent to which household deprivation is clustered within communities in Bolivia and to explore contextual factors associated with between-community differences in deprivation. Bolivia had one of the worst performances in poverty headcount ratio and chronic malnutrition in Latin America in the first decade of the twenty-first century, according to World Bank data. Bolivia is found to have a high level of segregation, since the main source of variation in deprivation arises from differences across communities, rather than within communities. Ethnicity, education, administrative region, distance to urban centres, and drought-induced migration significantly predict differences in the mean level of deprivation across Bolivian villages. This analysis helps to identify clusters of deprivation and highlights crucial sectors to be developed in order to reduce unevenness in the distribution of deprivation.


Introduction
Segregation can be defined as a form of physical separation where population groups are isolated into different neighbourhoods (in case of residential segregation) or schools (in case of educational segregation), "shaping the living environment at the neighbourhoods [or school] level" (Kawachi and Berkman 2003).
Geographical clustering of deprived people is commonly associated with economic, ethnic, or physical segregation, being the consequence of variation in characteristics under study across areas. Segregation of deprivation may be related to social exclusion, 1 with important consequences for social and health policies. Among the effects of social exclusion, we can highlight a diminished access to public services and decreased opportunities for human capital development. In Bolivia, for instance, social exclusion has been identified as a possible mechanism through which individuals belonging to certain ethnic groups reside in areas that tend also to have lower education and income (Gray-Molina et al. 2002). There is some evidence that the opportunities and even the conduct of people residing in certain neighbourhoods is shaped, among other factors, by the characteristics of their neighbourhood (Jencks and Mayer 1990). Geographic and social isolation could therefore be among the factors underlying certain social pathologies among the poor (Greene 1991). The analysis of deprivation and poverty segregation can help to identify the most deprived areas, which are economically and socially isolated from the more developed areas. It can provide a tool to determine the economic, social and institutional factors related to spatial unevenness in the distribution of wealth over the area under investigation. Deprivation and poverty segregation might be particularly suitable for policy interventions related to urban planning at a more local level than the national or regional level (Amarasinghe et al. 2005). Moreover, since a higher mortality rate and higher exposure to infectious diseases is likely to be found in contexts of concentrated deprivation (Fiscella and Franks 1997;Szwarcwald et al. 2002), reducing the differences in deprivation among communities might also be associated with the better health outcomes.
This study builds on the previous use of multilevel modelling to assess social segregation in schools and areas using a single binary or categorical socioeconomic indicators (Goldstein and Noden 2003;Leckie et al. 2012;Jones et al. 2018a). The main contribution of this paper is that the outcome of interest, household deprivation, is treated as a continuous latent variable, measured by a set of multiple correlated indicators. Multilevel structural equation modelling (SEM) allows the simultaneous creation of a latent variable for household deprivation, and its decomposition into between-community and betweenhousehold within-community components to measure segregation of deprivation. Moreover, multilevel modelling allows us not only to describe patterns of segregation, but to investigate the contextual factors associated with deprivation segregation, since it might be of interest to examine whether average levels of segregation vary across communities as a function of community characteristics (Bruch and Atwell 2015).
The proposed multilevel SEM is applied in a study of segregation of deprivation in Bolivia in 2008 using survey data linked to global positioning system (GPS) data. By the end of the first decade of the millennium, Bolivia was one of the poorest countries in South America (Population Reference Bureau 2013), and more than half of the population fell below the poverty line, mostly in rural areas (World Bank 2014). Bolivian economic inequality is still great, with a Gini coefficient of 51.4 in 2008 (against an average of 49.9 of the other South American countries). The distribution of wealth within the country was not uniform, with considerable geographic and ethnic dissimilarities (Schroeder 2007). First, the extent of segregation of deprivation across Bolivian communities is quantified, and then area-level variables are used to explain the variation across communities, while allowing for segregation due to unmeasured area characteristics. The latent variable for household deprivation can be considered an alternative to previous indices, since it takes into account only items related to housing conditions with a sufficient degree of correlation among them, and which can therefore be considered manifestations of the underlying concept of household deprivation.

Descriptive segregation measures
The traditional approach in the study of segregation involves the use of descriptive indicators. The most widespread descriptive measure of segregation is the dissimilarity index (Duncan and Duncan 1955), which can be interpreted as the percentage of one of the population groups (for instance, the white population in the case of racial segregation) that would have to move to different areas in order to reproduce a distribution matching that of the larger areas. The dissimilarity index has been widely used in the deprivation and poverty segregation literature (Bibby 1975;Mershrod 1981;Napierala and Denton 2017), including the only study-to the best of our knowledge-on segregation in Bolivia, which investigated residential segregation in ten Bolivian cities (Gray-Molina et al. 2002). A drawback of the dissimilarity index is that it allows us to compute segregation only between two groups. Theil's (1972) information theory index, Bell's (1954) and Lieberson's (1981) isolation indices for multiple populations, and James' (1986) generalized exposure-based segregation index allow the calculation of segregation among multiple groups. Other measures of segregation that are based on the departure of each observation from measures of central tendency are the variance ratio index (Zoloth 1976), the Atkinson's family of segregation indices (Allison 1978), and the square root index (Hutchens 2001). Measures of variation based on the departure of each observation from all other observations, such as the Gini coefficient (Dorfman 1979), can also be interpreted as measures of segregation (Kim and Jargowsky 2009). As with the dissimilarity index, the Gini coefficient is related to the Lorenz-or segregation-curve (Gastwirth 1972). The standardized versions of these indices range from 0 (no segregation, i.e. all areas have the same proportion of population groups) to 1 (complete segregation, i.e. each area is composed of just one of the population groups) (Massey et al. 1996). The descriptive segregation measures described above are aspatial, meaning that they do not take into account the spatial proximity of the observations (Morrill 1991). A recent development in the measure of segregation involves the spatial dimension of segregation, for instance by including the length of shared boundaries (Wong 1993), or by using GPS data (Matthews and Parker 2013). The gradient of spatial segregation can be measured by spatial autocorrelation (Cliff et al. 1973), which has been widely used in the literature (Chakravorty 1996;Dawkins 2007; Amara and El Lahga 2016).

3
The above-mentioned indices are descriptive, meaning they are based on observed proportions of population groups that include the effect of random sampling variation (Allen et al. 2015). In other words, they fail to take into account the probabilistic component resulting from the sampling process; stochastic variation due to population sampling can bias segregation measurement, especially when small numbers are involved (Kish 1954;Leckie and Goldstein 2015). For instance, Leckie et al. (2012) pointed out that the dissimilarity index, which is based on observed rather than underlying proportions, has sources of bias depending on the size of the areas and on the underlying proportions; when analysing small areas, the dissimilarity index systematically overestimates segregation, suffering from the upward bias of the null (Allen et al. 2015). Brunch and Mare (2006) highlighted that indices of segregation based on the division of the population into categories based on some threshold, such as the dissimilarity index, are sensitive to changes in the choice of the thresholds. Finally, it is not possible to investigate the factors associated with deprivation segregation when descriptive measures are used (Owen 2015).

Multilevel modelling for studying segregation
A multilevel model approach overcomes the above-mentioned limitations, by separating the component of the observed proportion that is due to sampling variation. Segregation can be measured by estimating the higher-level variance parameter in the multilevel model (Goldstein and Noden 2003). This allows the assessment of the proportion of variation in the characteristic of interest that is due to the grouping of individuals within areas: the larger it is, the more segregated the neighbourhoods or schools are. By estimating standard errors, a statistical inference on segregation can be made (ibid). Moreover, multilevel models can be used to explore sources of segregation by including contextual covariates in the models (Leckie et al. 2012).
The first paper in this stream of literature is by Goldstein and Noden (2003), who measured the evenness of the distribution of disadvantaged students across English schools in the period 1994-1999, using a binary variable as the outcome, namely students' eligibility for free school meals. Since then, a growing number of studies using a multilevel approach have appeared in the literature. Three-level models were first used by Leckie et al. (2012) to study social segregation in schools, with students nested within schools nested within London local authorities. They were followed by other researchers, who applied the models to the study of the ethnic distribution within cities Leckie and Goldstein 2015;Manley et al. 2015;Johnston et al. 2016;Jones et al. 2018a), allowing simultaneous estimation of the micro-, meso-and macro-effects of segregation. Leckie and Goldstein (2015) and Manley et al. (2015) extended the multilevel binomial logistic regression used in previous work to a multilevel multinomial logistic regression to model segregation by a categorical variable. A multilevel approach in the computation of the dissimilarity index has been developed by Harris (2017) and Harris and Owen (2017) when studying the residential segregation of students in England. Moreover, multilevel models can be extended to take into account the spatial proximity of areas, by including spatial weights (Jones and Subramanian 2014) and dependencies between areal units (Dong and Harris 2015;Jones et al. 2018b).
The present analysis involves a continuous latent dependent variable measured by multiple binary indicators as an outcome, and therefore requires an extension in a SEM framework of the multilevel models used in previous work. An application to Bolivia is proposed in the last sections of the paper, in order to quantify the extent of segregation in the country and to explore contextual factors associated with differences in the mean deprivation across communities.

Latent variable model for household deprivation
An index measuring deprivation (or wealth) is an alternative to monetary measures such as income or expenditure, which are often unavailable or unreliable in low-or middle-income countries (Filmer and Kinnon 2012). Deprivation can be considered as a concept underlying certain characteristics of living standards and can therefore be derived from a set of observable items.
A key point in the creation of a composite index of deprivation is the choice of weights to be assigned to the observed items. Many approaches exist in the literature, ranging from the simple sum of the owned items to more sophisticated data-driven techniques that take into account the extent to which each item discriminates between households' deprivation (Vandemorteele 2014). Among these composite indicators, the DHS wealth index, built from principal component analysis (PCA), is probably the most widespread (Rutstein and Johnson 2004). In the following sections, a critique of the construction of the DHS wealth index is presented, and a latent variable approach is proposed.

Critique of the DHS wealth index
The DHS wealth index is constructed by means of PCA, a technique that transforms a set of observed correlated items into a set of linearly uncorrelated principal components by means of an orthogonal transformation (Jolliffe 1986). PCA's major limitation is that it does not take into account the categorical nature of the observed indicators, treating them as continuous, which is analogous to using an OLS regression for the analysis of a categorical outcome (Howe et al. 2008). The wealth index scores are built from the first principal component, which often explains only a low proportion of the total variation in the observed items (Kolenikov and Angeles 2004). Moreover, since the correlation between the observed indicators has not been investigated before the analysis, the linear dependence between the items could lead to incorrect estimates of the wealth index (ibid.). Finally, using the DHS wealth index as a measure of deprivation in further analyses ignores the measurement error that arises from constructing an index from a set of items.

Rationale for the construction of a latent variable for household deprivation
SEM is a latent variable approach that incorporates a model for the relationship between a continuous latent variable and a set of observed items, considered as the manifestation of the latent variable (Bartholomew and Knott 2011). In this case, for instance, a set of observed items relating to housing conditions and living standards are combined into a latent variable for household deprivation.
A SEM is composed of a measurement model and a structural model, estimated simultaneously. The measurement model describes the relationship between the observed items and the latent variable. The structural model is a regression of the latent variable on a set of covariates (Bartholomew and Knott 2011). In contrast to PCA, the items included in the measurement model of SEM can be binary or polytomous (ibid.). Weights are assigned to the items depending on their ability to discriminate between households' scores on the latent variable. By estimating standard errors, SEM also allows testing hypotheses involving parameters of both the measurement and structural models. An important feature of SEM is that it takes into account the measurement error which may bias the estimates of the level of segregation within communities. Latent variables do not have measurement error associated with them, since they are not directly measured, therefore the association between them and other covariates can be estimated without any bias (Muthén and Muthén 2010).
In comparison to the DHS wealth index, a further development of the proposed approach is the selection of the observed items, which is based on the correlation matrix of all items. Only items relating to the latent concept of deprivation are included in the measurement model, as explained later.

Measurement model
The measurement model specifies the relationship between the latent variable and the observed items. Denote by y rjk the r th item (r = 1, … , p) of household j j = 1, … , n k , nested within community k(k = 1, … , K). Then the logit of the probability that household j in community k owns item r is: where jk ∼ N 0, 2 is the latent variable for household deprivation and r0 and r1 are, respectively, the difficulty and the discrimination parameters. The difficulty parameter r0 indicates how "difficult" an item is to be owned, while the discrimination parameter r1 indicates how well the r th item discriminates between households with different scores for deprivation. In order to identify the model, some constraint must be imposed on the item parameters. It is common to constrain one of the r1 s to 1, which sets the scale of the latent variable to be equal to the scale of the chosen item.

Multilevel structural model
In this paper, the multilevel structural models specify the partitioning of the variance into a between-community component and a within-community between-household component. Of particular interest is the extent to which community variation can be explained by the community-level covariates described earlier. An important characteristic of multilevel SEM is that the creation of the latent outcome variable and the analysis of its between-and within-community components is done simultaneously, while accounting for measurement error (Muthén and Muthén 2010).
The structural model specifying the decomposition of the latent variable jk into its within-and between-community components is: is the household residual and u (PSU) k ∼ N 0, 2(PSU) u is the community-level random effect. They represent, respectively, the within-community and the between-community components of household deprivation, and their variances 2(hh) u and 2(PSU) u are the within-community and the between-community variances. Segregation of deprivation is strictly related to variation across communities. In fact, the higher the between-community variation of the level of deprivation in a country, the higher the level of grouping of deprived people within geographical areas. On the other hand, no between-community variation indicates that no segregation is present in a country (Bulle 2016).
The models are fitted by maximum likelihood, and likelihood ratio tests can be used to compare the fit of nested models. The analyses have been carried out using the gsem function in the Stata software (StataCorp 2013).

Potential explanations for geographical segregation of deprivation in Bolivia
An application of the SEM models explained earlier is here proposed to explain the segregation of deprivation in Bolivia, by looking at the potential factors associated with the between-community variation in deprivation. Among these, ethnic composition, education, distance to urban centres and drought-induced rural-urban migration can have a central role.
The first factor that may affect the segregation of deprivation is ethnicity. The Bolivian population is mainly indigenous, and the ethnic distribution is not uniform, with indigenous populations more concentrated in certain areas-mainly the Altiplano (high plateau) and Valle (valley) regions. Almost the whole indigenous population (97.5%) of rural areas is found to be chronically poor (Castellanos 2007), since the lack of social welfare programmes leads to a high vulnerability to shocks such as droughts, floods and hailstorms (Buzaglo and Calzadilla 2009).
Education can play a role in explaining between-community variation in the level of deprivation in the country. The link between parental education and the socioeconomic status of a household is well established (Cornia 2014;King and Hill 1993). Education can also be a contextual factor in determining the unevenness of the distribution of deprivation across Bolivian communities. The average degree of education in the community can set the context for a wide set of socioeconomic factors, including economic disadvantage (Wight et al. 2006) which lead to the geographical segregation of deprivation.
Distance to urban centres might also explain deprivation segregation. Social segregation studied by Gray-Molina et al. (2002) in Bolivian urban environments, can be extended to rural areas. The main activity in rural areas is farming: peasants are vulnerable to shock linked to climate change such as drought (Castellanos 2007), and lack of roads might affect peasants' access to the market (Buzaglo and Calzadilla 2009). Rural areas are also associated with a lack of infrastructure (Andersen 2002) and basic services like sanitation and availability of clean water (Coa and Ochoa 2009), creating a setting of a higher mean level of deprivation.
Finally, Bolivia has been subject to natural disasters over the last decades. In particular, prolonged droughts have affected the South-West part of the country (Kessler and Stroosnijder 2006). Agriculture and livestock rely strongly on vegetation resources, the availability of which can be jeopardized by these events: it has been calculated that, in the period 1953-1993, Bolivia lost 30% of its agricultural productivity, and one of the main reasons is related to soil erosion (Benton 1993). Droughts have fostered migration towards the cities. Bolivia faced a rapid process of urbanization, either temporary or permanent, between the 1980s and the 2000s (World Bank 2015). Drought-driven rural-urban migration can lead to the uneven residential sorting of rural migrants within cities, which leads to a rise in the level of urban residential segregation. Moreover, there is some evidence of a recent trend towards migration differentiated by age-group. The main mechanism is related to the fact that young men are gradually excluded from access to agricultural soil, due to the increased unavailability of land (Balderrama 2011). Lands are usually distributed among the children, but there is evidence of the tendency of migrant young men to refuse their share of the inheritance (Michels 2011). This selective migration (Borjas and Tienda 1987) can therefore be another explanation for the segregation of deprivation in Bolivia.

Data and measures
The Demographic and Health Surveys (DHS) collect data on a broad range of aspects related to health and living conditions. In the sampling process, clusters of a standard size of 100 households are identified and mapped in the territory of the country under investigation, and a further selection within each of these selected clusters is made: each of these areal units serves as a primary sample unit (US Aid 2012). In this paper, primary sample units are considered to be proxies for the respondents' communities, as in previous studies (Uthman et al. 2011;Robson et al. 2012).
The 2008 Bolivian DHS dataset contains 19,564 households from 999 communities. Among them, 11,361 household have complete records on the ownership of the items related to housing conditions and on the variable included as predictors in the structural model. The full set of items related to housing conditions, living standards and owned assets available in the DHS dataset includes: availability of electricity, availability of clean water, type of sanitation, material of the floor, type of cooking fuels, and ownership of refrigerator, radio, television, motorbike, car, telephone and bicycle. These are the items used in the construction of the DHS wealth index, a composite measure of a household's cumulative living standard (Rutstein and Johnson 2004). All the observed variables have been dichotomized, in order to simplify the interpretation of the parameters of the models. As noted earlier, there are four main factors that can be linked to the between-community variation in deprivation: ethnicity, education, distance to urban centres and droughtinduced migration. These are represented by six explanatory variables listed in Table 1. All of these (except group mean centred years of male education) have been measured at the community level.
The contextual binary variable Indigenous, provided by DHS, indicates whether a household lives in a community which has a majority of indigenous or non-indigenous villages. The mean level of male education within each community has been chosen as a contextual variable. When including a contextual variable calculated as the mean of a household-level variable, it is common to include the group mean centred householdlevel variable, in order to separate the between-and within-community effects (Snijders and Bosker 2012). For households with more than one adult male (5.97% of the total), the mean value of years of schooling of the males registered at that household has been calculated. In general, individual-level male education can better explain the level of deprivation than female education: paternal rather than maternal income is a strong determinant of the wealth status of the household (Cornia 2014;Thomas 1990), and in Bolivian indigenous groups, men are more likely to assume the position of breadwinners (Paulson et al. 1990).
The distance from the centroid of each DHS cluster to the closest municipal capital has been obtained by linking the DHS GPS dataset and the GeoBolivia dataset (GeoBolivia 2017a), which provides the location of the 339 Bolivian municipal capitals. The distance has been calculated using the Haverisine formula 2 (Robusto 1957). The distance to the closest municipal capital can provide a better measure of the variation between urban and rural environments, approaching the concept of Woods' (2003) "urban-rural continuum". The mean distance of the communities labelled as urban in the DHS variable is 3.88 km, while it is 16.84 km for the rural communities. The variable related to risk of drought has been created by linking the DHS GPS dataset with the 2002 National System for Early Alert of Food Security (Sistema Nacional de Seguridad Alimentaria Alerta Temprana, SINSAAT) (GeoBolivia 2017b). This dataset classifies areas into four levels of drought risk, depending on the frequency of drought over the period 1972-2002. Very low risk is defined as one or no drought every fifth year over the 30-year period, low risk as a drought every fourth year, medium risk as a drought every second year and high risk as four or more droughts every 5 year.
In the most recent DHS surveys, each community is georeferenced during the sample listing process. The GPS readers are in general accurate to less than 15 metres, but the GPS coordinates of each community are randomly displaced due to issues of confidentiality: the error ranges from 0 to 2 km for urban communities and from 0 to 5 km for rural communities (Perez-Heydrich et al. 2013). While cluster displacement might induce large misclassification errors when calculating the distance between clusters' centroids and health facilities or other specific locations (Skiles et al. 2013), the random displacement of the centroid of the communities is unlikely to affect the results of this study. First, the region of each community is directly calculated from DHS, so no issue of displacement arises even when the random error is introduced. Second, the distance to the closest municipal capital is the variable that mostly could be affected by the random error, but it is still considered a better approximation of the rural-urban continuum (Woods 2003) than the binary variable provided by DHS, which has only the two categories "urban" and "rural". Third, the areas for risk of drought are very large and the risk of displacement of a community is very low.

Selection of deprivation indicators
The full set of 12 items available in the DHS dataset included Electricity, Water, Sanitation, Floor, Cooking fuels, Radio, Television, Refrigerator, Motorbike, Bicycle, Car and Telephone. These are the same items used for the construction of the DHS wealth index. These items were divided into two sets: the first five items were related to the living environment, while the last seven were assets or possessions. The aim of the investigation of the correlation matrix was to select the observed items used to construct the latent variable, in order to avoid multicollinearity and to have a coherent set of indicators measuring household deprivation. Tetrachoric correlations estimated the correlation between two theorized normally distributed latent variables from two observed binary variables (Divgi 1979). With the aim of analysing a unique latent variable for household deprivation, the aforementioned observed variables were selected according to their tetrachoric correlations. The items Bicycle, Motorbike, Car, and Radio showed a weak tetrachoric correlation with the rest of the items, and were therefore excluded from the measurement model. Although the correlations between Television, Telephone and the retained items were sufficiently strong, they were excluded from the measurement model on a theoretical basis. These items cannot be considered as basic needs in the context of a low-income country such as Bolivia. On the other hand, the asset Refrigerator is the only one that was retained in the measurement model, due to its strong association with health outcomes. By allowing us to keep food fresh, a refrigerator can indeed be related to hygiene and diseases (Lagendijk et al. 2008). Therefore, the six selected items for the measurement model of household deprivation were Electricity, Water, Sanitation, Floor, Cooking fuel and Refrigerator. These items had a tetrachoric correlation higher than 0.5 ( Table 2), suggesting that they were manifestations of the same underlying concept.

Measurement model for household deprivation
The measurement model of Eq. (1) can be interpreted as a single-level model. The total variance of the latent variable 2 was estimated as 19.15. The Spearman rank correlation with the DHS wealth index was high in the single-level latent variable, with a value of 0.92. This result is consistent with previous attempts to construct a latent variable for wealth (Vandemoortele 2014).
Note that the discrimination parameter related to the item Electricity was constrained to 1 for identification. As can be seen in Table 3, Cooking fuel and Electricity were the items that best discriminated between households with different deprivation scores, while Water and Sanitation had the least discriminatory power. Therefore, having electricity discerned household deprivation better than, for instance, having clean water. Moreover, Water and Sanitation were the more likely items to be owned (those with lower values in the difficulty parameters), while Cooking fuel was the least likely.

Results from the empty multilevel model
The aim of the multilevel structural models of Eqs. (2) and (3) was to analyse the distribution of the latent variable for household deprivation between and within Bolivian communities. In the multilevel model, the between-and within-community variance components were, respectively, 19.51 and 1.77. The intra-community correlation, that is the proportion of variation in the latent variable explained by the grouping of households within communities, allowed an assessment of the level of segregation: a high level of community-level variance reflects substantial differences in household deprivation across communities (Leckie et al. 2012). For this model, a high proportion of variation in the latent variable (around 92%) was due to the grouping of households within communities. Thus, households within the same community had very similar scores on the latent variable of deprivation. This finding is consistent with previous studies: Castellanos (2007) points out the relatively low level of inequality among indigenous households in rural Bolivian communities.

Results from the models including contextual factors of deprivation segregation
SEM allowed investigation of the factors associated with deprivation segregation, by including community-level covariates in the model. Table 4 shows the results of the univariate and multivariate models of Eqs. (2) and (3). First, the coefficient of Indigenous was significantly negative: communities with a majority indigenous population were found to be more likely to have higher mean deprivation. Indigenous origins are associated with poverty in rural Bolivian communities (Albo 1994;Grootaert and Narayan 2004); the Bolivian indigenous population is mainly clustered in the Altiplano and Valle regions in isolated rural communities, with high vulnerability to natural hazards and a subsequent lack of roads, access to markets, and social infrastructure (Buzaglo and Calzadilla 2009). Therefore, due to their disadvantaged position, the concentration of indigenous households in certain areas leads to the segregation of deprivation. Second, both coefficients related to male education were significant and positive. The between effect indicated that the higher the mean level of male education within a community, the lower the mean level of deprivation of that community. Education underlies a broad range of socioeconomic factors, including lower economic conditions (Wight et al. 2006), leading to deprivation segregation. While indigenous origins are associated with lower formal education in the literature (Castellanos 2007), the multivariate model in this paper indicated that education is associated with segregation of deprivation while also taking into account ethnicity.
Third, two regions, Potosí and Beni, had a significantly higher level of deprivation than La Paz. The territory of Potosí, located in the South-West of the country, is mainly mountainous, posing issues of accessibility, as well as difficulties in promoting extensive agricultural exploitation. This region presents the highest presence of indigenous population (Castellanos 2007), and has been affected several times by severe drought (Gray-Molina et al. 2002). Beni's case is different: this region is rich in raw materials and represents one of the biggest agricultural centres in Bolivia (Vadez et al. 2004). Despite its richness in natural resources, the level of poverty is still high, being a mainly rural territory, lacking big urban centres and being in a logistically marginal area when compared to the leading Bolivian economic poles (Weisbrot and Sandoval 2008).
As a fourth result, the coefficient of Distance to municipal capital was significantly positive: every additional kilometre of distance from the closest municipal capital was associated with an average decrease of 0.18 in the community-level score of the latent variable for household deprivation. Rural populations are strongly dependent on farming productivity, which leads to a high vulnerability to shocks such as drought or flooding (Castellanos 2007). Rural populations are also exposed to endemic diseases that can affect labour productivity and consequently levels of deprivation (Buzaglo and Calzadilla 2009), since 26.7% of rural households retrieve water from a source considered unsafe, and 56.7% lack basic sanitation services (against, respectively, 5.4 and 9.3% in urban areas) (Coa and Ochoa 2009).
Moreover, the coefficients indicated that the communities located in the medium-and low-risk areas of drought had a lower mean level of deprivation than the communities in areas of high risk. Climate change has triggered rural-urban migrations; a rapid process of urbanization has been observed in Bolivia between the 1980s and the 2000s (World Bank 2015). Punch (2004) observes that in a rural Bolivian village in Tarija (located in the area at medium risk of drought) migration rather than education is considered the best way to improve living standards, since migrant work offers more security and immediate benefits. Rural-urban migration was associated with the uneven residential sorting of the migrants within the urban environment, increasing the level of urban residential segregation.
Little difference was found in the multivariate model simultaneously including these variables: rural, indigenous communities with a lower mean level of male education and at higher risk of drought were significantly more likely to have higher mean deprivation. Region was not included in the model, since it was highly correlated with Risk of drought: the areas of risk overlapped with many of the Bolivian regions. Risk of drought was preferred because of its higher theoretical value as a potential explanation for segregation of deprivation within communities, being a cause of selective rural-urban migration (Balderrama 2011).

Discussion
This paper proposes a general SEM approach to the study of geographical segregation, by extending the multilevel modelling approach proposed by Goldstein and Noden (2003) to handle constructs measured by multiple indicators. This approach enables us to not just quantify the extent of segregation but to model patterns of segregation as functions of contextual factors.
The proposed multilevel SEM approach is applied in a study of deprivation segregation in Bolivia, a country that presented among the highest indicators of poverty and deprivation in Latin America (Coa and Ochoa 2009). By analysing 2008 DHS data, a latent variable for household deprivation was created from a set of six observed items, and simultaneously included in the SEM models, overcoming issues related to measurement error (Muthén and Muthén 2010). Bolivia was found to have a high level of segregation of deprivation, since a high proportion of variation in the latent variable was due to the grouping of households within communities. Ethnicity, education, administrative region, distance to urban centres and drought-induced migration significantly explained differences in the mean level of deprivation across Bolivian villages. This analysis highlighted the differences in the use of the latent variable in comparison to the DHS wealth index; the inclusion of this latter measure leaded to an underestimation of the magnitude of the segregation of deprivation in Bolivia, since the DHS wealth index did not take into account measurement error and the items used in the construction of the two indices were slightly different.
The results of the analysis have implications for social and health policies. By identifying the contextual factors associated with the segregation of deprivation, this paper provides evidence on the mechanisms leading to economic and social segregation. This analysis helps in identifying segregation of deprivation within Bolivia, and highlights crucial sectors to be developed in order to fight spatial unevenness in the distribution of wealth, linked to social exclusion, diminished opportunities for human capital development and lower access to public services. Finally, reducing inequality across Bolivian communities could also positively affect health indicators, since contexts of concentrated deprivation are associated with higher mortality and higher exposure to infectious diseases (Fiscella and Franks 1997;Szwarcwald et al. 2002).