Introduction

Nearly one out of eight people in the world live in slums (i.e., informal settlements with precarious dwellings, a lack of essential services, and insecure tenure),Footnote 1 which translates to approximately one billion individuals living in slum conditions (UN-Habitat, 2010, 2013, 2016a). Most people living in precarious conditions are found in developing countries, where roughly 863 million individuals, or one-third of their population, reside in informal settlements. Latin America is the world’s second most urbanized region, with 20% to 30% of its urban population residing in informal and impoverished settlements and 60% of the region’s poor individuals living in slums (Vargas et al., 2017). Since it is projected that poverty will continue to urbanize throughout the region, analyzing the determinants of housing informality is crucial for understanding the growth of this type of settlement and proposing government measures to improve the quality of life in informal settlements.

Several factors have been identified in the literature to explain the formation and proliferation of informal settlements. Notably, Roy et al. (2014) and Mahabir et al. (2016) identified factors such as local topography, street patterns, population dynamics, the politics of slums, and the informal economy. While there is ample evidence of the effects of local conditions and policies on the growth of slums (Alves, 2018; Cavalcanti et al., 2019; Marx et al., 2013), the available evidence of the effects of demographics and economics is scarce, particularly concerning how informal economyFootnote 2 relates to the formation and functioning of informal settlements.

According to International Labor Organization (ILO) (2018), Bonnet et al. (2019), and Deléchat and Medina (2021), 61% of all workers worldwide (around 2 billion workers) are informally employed, with the prevalence of labor informality being significantly higher in developing countries (90%), than in developed countries (18%). In Latin America, 60% of employees are engaged in informal activities, and 62% are extremely poor and vulnerable. The countries in this region with the highest rates of labor informality are Bolivia (79%), Paraguay (70%), Peru (67%), and Colombia (60%) (OECD/ILO, 2019). Labor informality has been reported to have considerable implications for development outcomes, as it is an obstacle to sustainable development due to its close relationship with economic growth, poverty, and inequality. Informal workers lack formal and social protection, are less educated, and work in small firms with low productivity (Del Carpio & Patrick, 2021). As a result, they are more susceptible to poverty and face significant barriers (or exclusion) to accessing the educational and financial systems (Deléchat & Medina, 2021; Roy et al., 2014).

Concerning the relationship between slums and labor activities, recent research has shown that informal activities are a primary economic driver for the emergence of slum conditions (Cavalcanti, 2017, 2019). For instance, an important characteristic of informal workers is that they typically carry out their activities at home, blurring the boundaries between domestic and labor spaces, thus leading to precarious living conditions (Cavalcanti, 2017; Suárez et al., 2016). Additionally, informal workers often live in peripheral neighborhoods with fewer regulations, lower housing prices, and reduced costs of goods, services, and other commodities (Posada, 2018). Consequently, informal construction proliferates in informal neighborhoods, which results in a high concentration of low-income, informal workers in these informal settlements (Cavalcanti, 2019; Cavalcanti et al., 2019; UN-Habitat, 2011). This pattern of urban informality prevails in developing countries, particularly Latin American cities (Abramo, 2009; Bouillon, 2012; Maloney, 2004; UN-Habitat, 2012).

The spatial relationship between informal housing and informal employment aligns with Rallet’s (2000) concept of organized proximity. In this regard, the relationship of proximity between these two types of informality may be constituted by institutionalized relationships, informal and tacit networks, and hierarchies, or even by the power of force and violence. Posada and Moreno-Monroy (2017) suggest common structural features in the relationship between informal housing and informal employment. Therefore, to understand the factors that influence the formation of slums and the nature of precarious work, it is essential to analyze the relationship between informal work activities and informal housing. Accordingly, this paper aims to provide new evidence on the relationship between informal housing and labor informality and the role of space in this relationship.

For our analysis, we use data from Medellín, the second-largest city in Colombia, which makes for an interesting case study because of its high levels of informal housing and labor informality and the marked spatial segregation of these two phenomena within the urban context. Our empirical analysis is carried out at the intraurban level, considering 176 analytical regions as our spatial units of analysis for 2017. We perform spatial analysis to examine the simultaneity of housing informality and labor informality in space. In addition, we estimate spatial simultaneous equations models to evaluate the simultaneous effects of housing informality and labor informality, considering the spatial dimension of these two phenomena. Such models are estimated using General Spatial Three-Stages Least Squares (GS3SLS), which is a system instrumental variable procedure that includes a Generalized Method of Moments (GMM) estimation when there is spatial interrelation in the error terms (Kelejian & Prucha, 2004). Also, this procedure allows us to address the endogeneity resulting from the simultaneous occurrence of the two types of informality and the inclusion of spatial lags in the models.

This paper contributes to the existing literature in three ways. First, unlike previous studies that have examined housing informality and labor informality separately (with few considering the spatial dimension of these two phenomena), we study them simultaneously. Second, by conducting a spatial analysis at the intraurban level, we gain a more detailed understanding of the spatial dynamics of informal housing and its relationship with labor informality. Third, our empirical approach considers the spatial spillover effects inherent in both types of informality, allowing us to explore the role of precarious labor conditions in explaining precarious housing conditions, as well as the role of spatial spillover effects. This study is the first to rigorously analyze the spatial relationship between informal housing and labor informality. In addition, it provides empirical evidence and conclusions specific to the case of Medellín and offers conceptual and methodological insights into urban informality relevant to other cities in the Global South.

The rest of this paper is organized as follows. Section "Related literature" discussed the relevant literature on informal housing and its relationship with the informal economy. Section "Data and descriptive statistics" provides an overview of the study area and the data used in this study and presents some descriptive statistics. Sections "Econometric model" and "Results" explain the proposed empirical strategy and report the key findings. Finally, Section "Conclusions" summarizes the main findings and concludes this research.

Related Literature

In this section, we present a literature review and discuss the key factors that influence the emergence and growth of informal housing, focusing on the literature about the informal economy.

A considerable body of work has investigated the factors contributing to slum development. Notably, the studies by Roy et al. (2014) and Mahabir et al. (2016) have emerged as notable contributions to this literature. Roy et al. (2014) performed a comprehensive review of various studies on slums and identified seven critical factors associated with the formation of informal settlements: population dynamics, economic growth, housing market dynamics, local topography, street pattern, politics of slums, and informal economy. Concerning the informal economy, the authors stated that a large and persistent informal sector (firms and workers) could foster stability within slums and attract more people to live in these areas. Hence, according to these authors, any discussion on informal settlements must consider their relationship with the informal economy.

Mahabir et al. (2016) also conducted an exhaustive literature review and identified four key factors influencing the growth of informal settlements: location, rural-to-urban migration, poor urban governance, and ill-designed policies. Although the authors did not directly mention the relationship between informal work and informal housing, they did highlight that the location choices made by slum dwellers are influenced, in part, by social and economic ties that may be related to similar income-generating activities. In other words, low-income informal workers may attract more informal workers, leading to more low-income people moving into the slums.

Regarding the literature in the Latin American context, we highlight the studies by Bouillon (2012), Posada and Moreno-Monroy (2019), and Alves (2021). In his study, Bouillon (2012) examined the difficulties and challenges facing the housing market in Latin America and the Caribbean. According to the author, the quantitative and qualitative housing deficit in this region, the barriers to accessing the financial system through mortgage loans, and the elevated costs of land and construction are the main factors driving the formation of precarious settlements. Concerning the effects of labor informality, Bouillon (2012) argues that the inability of informal workers to document their income is the main obstacle to accessing a home loan program. In particular, the study finds that about 30% of households lack access to mortgage loans because they do not have sufficient income or cannot document their income. Thus, the growth of informal housing in Latin American cities is mainly caused by low-income populations' lack of access to the financial system and the absence of government policies to address this problem.

For their part, Posada and Moreno-Monroy (2019) studied the relationship between informal housing and informal employment in Sao Paulo (Brazil) and Bogotá (Colombia). The authors presented a set of stylized facts drawn from these cities and proposed a search model to analyze these types of informality within a unified framework. Among the stylized facts, the authors found that the distribution of informal work-related trips is more decentralized and, therefore, informal workers commute less frequently than formal workers. This pattern can be attributed to a higher prevalence of informal street or home activities. Regarding the housing market, the authors showed that informal settlements are more common in the city’s periphery, while formal housing is concentrated in the city’s center. By incorporating these stylized facts into a search model, the authors demonstrated that, during formalization, there is more intense competition for proximity to the city’s center, where formal jobs are concentrated. This competition increases rental prices for formal housing and, thus, the displacement of informal workers to the city’s periphery. Consequently, as the city expands to accommodate informal and formal workers, the prevalence of informal housing increases.

More recently, Alves (2021) investigated slum growth in Brazilian cities by estimating a spatial equilibrium model. His research contributes to the literature by demonstrating that urban wages rise as cities experience economic growth, attracting many low-income individuals who, given their limited human capital, are predominantly employed in the informal sector. However, as the costs of formal housing increase when compared to informal housing, a significant proportion of low-income individuals engaged in informal activities are compelled to reside in slum neighborhoods, thereby contributing to the expansion of such settlements.

Bonet et al. (2016) and Gallego et al. (2018) are among the few empirical studies in Colombia that have attempted to analyze informal housing and labor informality jointly. Bonet et al. (2016) studied housing informality and labor informality in Colombia’s main cities, investigating their characteristics, dynamics, relationships, and the main factors influencing them. They conducted both aggregate analyses and analyses using household survey microdata. At the aggregate level, the authors found a positive relationship between housing informality and labor informality in the cities under analysis, which has persisted over time. Regarding the relationship between the two indicators of informality at the individual level, the study revealed that 30% of informal workers in Colombia live in informal settlements, while this percentage is 15% for formal workers. The analysis of microeconomic factors influencing informal housing and labor informality demonstrated that an informal worker increases the likelihood of living in an informal settlement. Their findings also showed that education plays an important role in reducing informality in both housing and employment, while low-income levels and household size have a positive effect on informal housing.

For their part, Gallego et al. (2018) are the only authors who have examined informal housing and labor informality at the intraurban level. Even though they focused on investigating the determinants of labor informality at the intraurban level in Medellín (Colombia), they also established, for the first time, a relationship between informal housing and labor informality, considering the spatial dimension of informality. These authors showed the spatial simultaneity of labor informality and informal housing. In particular, they found that areas with high levels of informal housing also have high levels of labor informality and that urban informality is concentrated in the peripheral and impoverished areas of the city. Another important contribution of the authors is their consideration of spatial spillovers, an essential aspect of analyzing urban informality in space.

Despite the prevalence of informal housing and informal employment in Latin American cities, these two types of informality have mainly been studied separately, with little empirical evidence of the relationship between them. To address this gap in the literature, this study attempts to empirically investigate the simultaneity of informal housing and labor informality at the intraurban level. In addition, it seeks to examine the existence of spatial spillover effects, which are crucial in understanding this type of urban phenomenon.

Data and Descriptive Statistics

Study Area and Spatial Unit of Analysis

In this study, we use data from Medellín, a city in the region of Antioquia in northwestern Colombia and the second largest city in the country after Bogotá, the capital. The city has a population of 2.6 million people and spans an area of 380.64 km2, resulting in a population density of 6,749 inhabitants per km2. In terms of socio-economic variables for 2017, Medellín is characterized by an important heterogeneity in the labor market, with unemployment and labor informality levels of 10.6% and 42%, respectively (Colombia: unemployment rate: 9.8%; labor informality rate: 47%) (DANE, 2023a). Regarding income inequality and poverty, Medellín presents a Gini coefficient of 0.52 and a poverty rate of 14.2% (Colombia: Gini coefficient: 0.51; poverty rate: 27%) (DANE, 2023b).

Our analysis focuses on the urban part of Medellín, which is divided into 16 districts, or communes, and 246 neighborhoods. Figure 1 shows the spatial distribution of population density and income levels across Medellín. As observed, the most densely populated areas are in the north and southwest of the city. Low and low-middle-income residents predominantly populate the northern part of the city, while wealthy residents primarily populate the southern part. Notably, Medellín stands out from other cities in Colombia and Latin America because of its public transportation system, the Metro system. This system has significantly improved the city’s accessibility, especially for most remote and low-income residents (Bocarejo et al., 2014).

Fig. 1
figure 1

Source: 2017 Quality-of-Life Survey for Medellín

Study area (Panel A) Density and (Panel B) Distribution of income categories, Medellín. Notes: Population density (population/km2) by communes. Levels of income at the analytical regions level based on economic stratification in six categories (1 = very low, 6 = very high).

We use analytical regions as the spatial unit of analysis. As mentioned above, Medellín has two levels of administrative divisions: communes (16) and neighborhoods (246). Communes are very large and internally heterogeneous regions, which poses challenges when using them as spatial units of analysis. In 2017, there was an average of 16 neighborhoods per commune in Medellín. The largest communes were Robledo and El Poblado, with 25 and 22 neighborhoods, respectively, and the smallest communes were Guayabal and Santa Cruz, with 9 and 11 neighborhoods, respectively (see Fig. 1). Using statistical inference based on large administrative regions, nonetheless, can lead to aggregation issues such as the ecological fallacy (Robinson, 1950), aggregation bias (Amrhein & Flowerdew, 1992; Fotheringham & Wong, 1991; Paelinck, 2000), and the Modifiable Areal Unit problem (MAUP) (Openshaw & Taylor, 1981).

Employing neighborhoods as the spatial unit of analysis also entails certain limitations. First, statistical analysis at the neighborhood level may suffer from a lack of statistical validity because the data available for socio-economic studies in Medellín come from the Quality-of-Life Survey and are designed to be representative at the commune level. Second, calculating rates can be problematic due to the small number of households surveyed at the neighborhood level (Diehr, 1984). In 2017, for instance, the average number of households surveyed per neighborhood in Medellín was 113 (SD = 74.4), with values ranging from 1 to 406, and 19% of neighborhoods had fewer than 50 households surveyed. Lastly, Weeks et al. (2007) noted that using administrative divisions at the neighborhood level could lead to spurious spatial autocorrelation issues.

An alternative approach to overcoming the limitations of using communes or neighborhoods as the spatial unit of analysis is to employ analytics regions. These regions are spatial units that meet specific criteria, such as size, shape, and internal homogeneity, making them suitable for studying socio-economic phenomena in space (Duque et al., 2013). In this study, we follow the strategy used by Duque et al. (2013) and Duque et al. (2015). According to these authors, each analytical region must satisfy two conditions: (1) it should be a homogeneous region in terms of a set of relevant socio-economic variables for the phenomenon under analysis (Duque et al., 2006, 2013), and (2) it should contain a minimum threshold surveyed households to ensure statistical representativeness (Duque et al., 2015). The first criterion contributes to minimizing aggregation bias, while the second requirement helps to solve the problem of small numbers of households surveyed (Diehr, 1984) and, in turn, reduces the impact of inaccuracies in geolocation (Duque et al., 2015).

To delineate the regions, we use the Max-P-regions algorithm using ClusterPy (Duque et al., 2011, 2013). This algorithm attempts to aggregate large regions into a maximum number of smaller regions, such that each new region is homogeneous concerning a set of socio-economic characteristics and satisfies a predefined minimum threshold of surveyed households. This approach is primarily a minimization process of intra-regional heterogeneity and aggregation bias, and it is the only algorithm that allows endogenizing the number of regions (Duque et al., 2013). Following the strategy used by Duque et al. (2013), who designed analytical regions to study intraurban poverty and calculate a slum index in Medellín, and which therefore fits our study quite well, we use information from the 2017 Quality-of-Life Survey for Medellín on variables related to house characteristics and households’ members. In terms of the minimum threshold of surveyed households, we use the same established by Duque et al. (2013), set at a minimum of 100 surveyed households. According to the authors, this threshold attempts to capture the heterogeneity in the larger spatial units so that the new smaller regions created consider that heterogeneity. The authors also mention that the value of 100 surveyed households is oversampling to ensure statistical validity, but optimal threshold setting is an open question that requires further research. The number of analytical regions for Medellín was very similar to those found by Duque et al. (2013), where 246 neighborhoods were grouped into 176 analytical regions, with an average area of 0.521 km2. Figure 2 illustrates the comparison between analytical regions and the neighborhoods.

Fig. 2
figure 2

Spatial unit of analysis: 176 analytical regions

Preliminary Evidence

The data used in this paper come from the 2017 Quality-of-Life Survey for Medellín. This annual cross-sectional survey provides individual-level information on household characteristics, demographics, education, social security, labor market dynamics, and poverty and socio-economic conditions indicators. The sample considered in this study includes individuals aged between 18 and 60 residing in the urban area of Medellín. It includes approximately 43,000 individuals and 12,000 households, an expanded sample of around 2.4 million individuals and 780,000 households. The data were aggregated at the analytical region level to calculate the rates for the variables of interest.

Regarding the definition of informal housing, we adopt the definition proposed by the Nations Human Settlements Program (UN-Habitat) (2003, 2006, 2016b), which includes some type of housing deprivation, such as low standards of urban services, insecure land tenure, or non-durable housing structures. According to this definition, slums are settlements that have one or more of the following characteristics: (1) walls that are not made of durable materials, (2) overcrowded inadequate living spaces (i.e., more than three people per room), or (3) lack access to either potable water or improved sanitation services.Footnote 3 This definition has been widely used in the literature and applied in different countries. Some of these studies include those by Rains and Krishna (2020) in India; Brueckner (2013) in Indonesia; Aliu et al. (2021) in Nigeria; Duque et al. (2015) and Gallego et al. (2018) in Colombia; Galiani et al. (2017) in El Salvador, Mexico and Uruguay; and Cavalcanti et al. (2019) and Alves (2021) in Brazil.

As for the definition of labor informality, we opt for a legalistic definition that regards informality as the lack of mandated labor protection for workers. In particular, we define informal workers as those not covered by the health insurance and pension systems. This definition is commonly used in empirical studies on labor informality, as it is easy to implement using household surveys across different countries. Moreover, it is much broader, allowing for the inclusion of a more heterogeneous group of informal workers (García, 2019; Perry et al., 2007).

Using these definitions, we calculate the variables of interest related to informality at the analytical region level. On the one hand, we compute the informal housing rate as the proportion of households within each analytical region that exhibit any of the characteristics of a slum. On the other hand, the labor informality rate is determined by the ratio of workers not covered by the social security system to the total number of workers in each analytical region. By employing this approach, we found that, on average, 6.8% of dwellings in Medellín were classified as informal, with some regions having informal housing rates as high as 26%. In terms of labor informality, we find that the average labor informality rate in the analytical regions was 33%, with a maximum rate of 64% (see Table 1).

Table 1 Descriptive statistics of informality, control and instrumental variables

Figure 3 shows the spatial distribution of the two types of informality. As observed, there is a simultaneity of high levels of housing informality and labor informality in the city's most peripheral and poorest areas, specifically in the northeast and west. In these areas, the housing informality rates range from 19 to 26%, while the labor informality rates range from 48 to 66%.

Fig. 3
figure 3

Spatial distribution of housing and labor informality, 2017

To confirm the spatial simultaneity of the two types of informality, we compute a bivariate map of informal housing and labor informality. This map, shown in Fig. 4, confirms the strong spatial simultaneity of both types of informality. Regions with high levels of informal housing also exhibit high levels of labor informality, particularly in the city’s eastern, western, and northern peripheries, which are densely populated by low-income individuals. Similar patterns of urban informality in peri-urban areas have been observed in other studies conducted in developing countries, such as that by Martínez-Jiménez et al. (2022).

Fig. 4
figure 4

Bivariate map between housing informality and labor informality, 2017

Furthermore, to explore the spatial dependence and clustering processes between the two informality types, we compute a bivariate Local Indicator of Spatial Association (LISA) cluster map (Anselin et al., 2002). This bivariate LISA cluster map, depicted in Fig. 5, reveals local spatial correlation patterns at the analytical region level between housing informality and neighboring regions' average labor informality rate. These local statistics are vital because they show that the magnitude of the spatial relationship between informal housing and labor informality is unevenly distributed across space. Concerning clustering, the results show that the study area is dominated by spatial clusters (high-high and low-low locations) rather than by spatial outliers (high-low and low-high locations). Clustering processes of informality are observed, wherein areas with high levels of housing informality are surrounded by nearby areas with high levels of labor informality (hot spots of intraurban informality) in the northeast and certain eastern parts of the city. Conversely, areas with low levels of informal housing are surrounded by areas with low levels of labor informality (cold spots of intraurban informality) in the western and southern of the city. This north-south division of housing and labor conditions reveals the marked spatial segregation of these two phenomena at the intraurban level in Medellín, which is also associated with the socio-spatial segregation prevalent in the city (Villarraga et al., 2014).

Fig. 5
figure 5

Bivariate LISA cluster map, 2017. Note: number of permutations = 999 and significance level at 5%. First-order Queen criterion of contiguity was used to calculate the spatial weights

We include socio-economic conditions related to human capital, demographics, and labor characteristics. Particularly, we consider the share of the population with tertiary education (% Tertiary education), the share of female heads of household (% Female heads of household), the share of children under six years of age (% Children), population density, and the unemployment rate. Table 1 presents the main descriptive statistics for these variables, and Fig. 6 displays their spatial distribution. As observed in the maps, the lowest levels of education and the highest levels of unemployment are concentrated in the peripheral and more informal areas of the city, which supports the existence of a relationship between urban informality and precarious socio-economic conditions. This relationship has also been documented by Roy et al. (2014), Villarraga et al. (2014), Mahabir et al. (2016), and Martínez-Jiménez et al. (2022).

Fig. 6
figure 6

Spatial distribution of the control variables

Econometric Model

To measure the relationship between housing informality and labor informality while considering the simultaneity of these two phenomena, we estimate the following system of two equations:

$${housing\;informality}_{r}={\beta }_{11}{labor\;informality}_{r}+{\mathbf{x}}'_{r}{{\varvec{\omega}}}_{1}+{u}_{1r}$$
(1)
$${labor\;informality}_{r} ={\beta }_{21}{housing\;informality}_{r}+{\mathbf{x}}'_{r}{{\varvec{\omega}}}_{2}+{u}_{2r},$$
(2)

where \({housing\;informality}_{r}\) and \({labor\;informality}_{r}\) represent the levels of housing and labor informality, respectively, in analytical region r, and they are the endogenous variables of the system. Term \({\mathbf{x}}_{r}\) is a vector of control variables, which include the region's human capital, demographics, and labor characteristics (see Table 1). Additionally, \({{\varvec{\omega}}}_{1}\) and \({{\varvec{\omega}}}_{2}\) are vectors of the coefficients to estimate, and \({u}_{1r}\) and \({u}_{2r}\) denote the stochastic errors. Our coefficients of interest to estimate are \({\beta }_{11}\) and \({\beta }_{21}\), which represent the effect of labor informality on informal housing and vice versa, respectively.

To consider the spatial dimension of housing informality and labor informality, we estimate four types of spatial simultaneous models: the Spatial Lag model (SLM), the Spatial Error Model (SEM), the Spatial Autoregressive Combined model (SAC), and the Spatial Durbin Model (SDM). The SLM incorporates a spatial lag of the dependent variable as an explanatory factor; hence, the system is given by

$$\begin{array}{c}{housing\;informality}_{r}={\beta }_{11}{labor\;informality}_{r}+{\rho }_{1}W{housing\;informality}_{r}\\ + {\mathbf{x}}'_{r}{{\varvec{\omega}}}_{1}+ {u}_{1r}\end{array}$$
(3)
$$\begin{array}{c}{labor\;informality}_{r}={\beta }_{21}{housing\;informality}_{r}+{\rho }_{2}W{labor\;informality}_{r}\\ + {\mathbf{x}}'_{r}{{\varvec{\omega}}}_{2}+ {u}_{2r}\end{array}$$
(4)

For its part, the SEM incorporates the spatial dependence on stochastic disturbances, that is,

$${housing\;informality}_{r}={\beta }_{11}{labor\;informality}_{r}+ {\mathbf{x}}'_{r}{{\varvec{\omega}}}_{1}+{u}_{1r}$$
(5)
$${u}_{1r}={\gamma }_{1}W{u}_{1r}+{\varepsilon }_{1r}$$
(6)
$${labor\;informality}_{r} ={\beta }_{21}{housing\;informality}_{r}+{\mathbf{x}}'_{r}{{\varvec{\omega}}}_{2}+{u}_{2r}$$
(7)
$${u}_{2r}={\gamma }_{2}W{u}_{2r}+{\varepsilon }_{2r},$$
(8)

where \({\varepsilon }_{1r}\) and \({\varepsilon }_{2r}\) are stochastic errors.

The SAC model combines the SLM and the SEM, including the spatial dependence on the dependent variable and stochastic disturbances. The structure of the SAC model is as follows:

$$\begin{array}{c}{housing\;informality}_{r}={\beta }_{11}{labor\;informality}_{r}+{\rho }_{1}W{housing\;informality}_{r}\\ + {\mathbf{x}}'_{r}{{\varvec{\omega}}}_{1}+{u}_{1r}\end{array}$$
(9)
$${u}_{1r}={\gamma }_{1}W{u}_{1r}+{\varepsilon }_{1r}$$
(10)
$$\begin{array}{c}{labor\;informality}_{r} ={\beta }_{21}{housing\;informality}_{r}+{\rho }_{2}W{labor\;informality}_{r}\\ {+\mathbf{x}}'_{r}{{\varvec{\omega}}}_{2}+{u}_{2r}\end{array}$$
(11)
$${u}_{2r}={\gamma }_{2}W{u}_{2r}+{\varepsilon }_{2r}$$
(12)

Finally, the SDM incorporates the spatial lag of the dependent variable and the spatial lags of the independent variables as additional explanatory variables. The structure of this model is as follows:

$$\begin{aligned}{housing\;informality}_{r}&={\beta }_{11}{labor\;informality}_{r}+ {\lambda }_{1}W{labor\;informality}_{r}\\&\quad+{\rho }_{1}W{housing\;informality}_{r}\\&\quad+ {\mathbf{x}}'_{r}{{\varvec{\omega}}}_{1} + W{\mathbf{x}}'_{r}{{\varvec{\theta}}}_{1}\\&\quad+ {u}_{1r}\end{aligned}$$
(13)
$$\begin{aligned}{labor\;informality}_{r}&={\beta }_{21}{housing\;informality}_{r}+{\lambda }_{2}W{housin\;informality}_{r}\\&\quad+{\rho }_{2}W{labor\;informality}_{r}\\&\quad+ {\mathbf{x}}'_{r}{{\varvec{\omega}}}_{2} + W{\mathbf{x}}'_{r}{{\varvec{\theta}}}_{1}\\&\quad+ {u}_{2r}\end{aligned}$$
(14)

In all the models, \(W\) is a matrix \((r {\text{x}} r)\) of spatial connectivity between the analytical regions (Anselin, 1988), which, in our case, is a standardized first-order Queen matrix.

Estimating the proposed system of equations may face two problems related to endogeneity (Le Gallo & Fingleton, 2021). The first problem arises from including endogenous variables as regressors, that is, the inclusion of the dependent variable from the second equation (labor informality) in the first equation and the inclusion of the dependent variable from the first equation (housing informality) in the second equation. The second problem stems from including a spatial lag in the model. To deal with these endogeneity problems, we use an Instrumental Variable (IV) approach.

To address the first endogeneity problem arising from the inclusion of endogenous variables as regressors, we use the Euclidean distance (in km) from the centroids of each analytical region to Medellín’s Central Business District (CBD) as an instrument for the labor informality variable in the first equation (see Table 1 for some descriptive statistics). This instrument is based on the spatial mismatch hypothesis, which states that poor labor market outcomes, both in quantity and quality, can be attributed to intraurban disconnection (Kain, 1968). In other words, greater disconnection from or distance to job opportunities can lead to adverse labor market outcomes due to factors such as high commute costs, reduced effectiveness, and intensity of job searches, elevated costs associated with job searches, and employer discrimination against employees living in outlying and marginalized areas (Gobillon et al., 2007). Under this hypothesis, it is possible that the calculated instrument effectively identifies the effect of labor informality on housing informality since it is relevant to explaining labor informality but does not directly influence housing decisions (exclusion restriction).

Regarding the endogeneity problem related to the housing informality variable in the second equation, we use the level of housing informality lagged in time (2011) as an instrument. The idea behind this instrument is that the level of housing informality in each region in the past tends to explain the level of housing informality in subsequent periods. Indeed, using a time series approach, Bonet et al. (2016) confirmed the temporal inertia of housing informality in several cities in Colombia. According to Liu (2017), using lagged variables as instruments aims to avoid the simultaneity caused by local shocks. Therefore, the lagged level of housing informality could serve as a suitable instrument for relevance and homogeneity, as it explains the emergence of informal settlements without directly determining the current level of informal work in the regions. In the literature, lagged values of endogenous variables have been widely used as instruments in urban economics since the pioneering work of Ciccone and Hall (1996). Other applications of these lagged values can be found in studies such as those by Moretti (2004) and Liu (2017).

To address the second endogeneity problem associated with the inclusion of a spatial lag in the models, we follow the spatial econometric literature, which states that the spatial lags of a superior order of control variables are adequate instruments to correct this endogeneity issue (Kelejian & Robinson, 1993; Kelejian & Prucha, 1998; Fingleton & Le Gallo, 2010). According to López et al. (2020), spatial lags of order 2 are enough; thus, we use these variables as instrumental variables to tackle this endogeneity problem.

Another methodological aspect to be considered in estimating the proposed system of spatial simultaneous equations is the presence of cross-equation correlations in the error terms. This study assumes that unobserved regional characteristics influence informality in housing and labor markets. For instance, limited access to poor or informal public transportation may constrain individuals’ ability to access formal employment opportunities and isolate certain parts of the city that could become informal settlements (Boisjoly et al., 2017; Posada & Moreno-Monroy, 2019; Vargas et al., 2017). Another relevant unobservable factor is the housing deficit, which is quite evident in developing countries. This deficit limits access to public services, affecting health and skill accumulation, and incentivizes informal housing solutions, especially among the most economically disadvantaged segments of the population (Cavalcanti et al., 2019; Vargas et al., 2017).

In summary, our model is a system of spatial simultaneous equations that allows for correlation between the errors of the equations. We use the Generalized Spatial Three-Stage Least Square (GS3SLS) estimator proposed by Kelejian and Prucha (2004) to estimate this model. This procedure first corrects the endogeneity problem using a two-stage estimator, namely the GS2SLS method. Then, it considers the relationship between the equations in the system through the stochastic disturbances analogous to a Seemingly Unrelated Regression (SUR).

Results

Table 2 presents the results of the non-spatial and spatial estimations.Footnote 4 For comprehensive estimates including all variables, direct and indirect effects, and first-stage estimates, please refer to Tables 3, 4, and 5 in the Appendix, respectively.

Table 2 Estimates of system of spatial simultaneous equations

We begin by discussing the validity of the instruments, that is, that the instruments are exogenous and relevant. In the family of spatial linear models, there is not yet an exogeneity test of the instruments, such as the Sargan/Hansen test, that can be calculated in non-spatial linear models. In addition, the system of equations is exactly identified, which means that tests of overidentifying restrictions cannot be calculated. Therefore, we based the discussion on the exogeneity condition of the instruments on theoretical prediction and the literature review discussed in the previous section. Regarding the relevance of the instrument variables, we use the first-stage estimates to corroborate this condition. Column (1) in Table 5 in the Appendix shows that the level of housing informality lagged in time (2011) is relevant in explaining the current level of housing informality, as the estimated coefficient is positive and statistically significant. This result confirms the theoretical prediction made in the previous section, where it is mentioned that housing informality is a very persistent phenomenon (Bonet et al., 2016). In the case of the labor informality variable, which has the Euclidean distance of each analytical region to the CBD as its instrument variable, we observe in column (2) in Table 5 in the Appendix that the estimated coefficient is positive and statistically significant. This result confirms the spatial mismatch hypothesis, i.e., greater distance to the CBD or intraurban disconnection leads to poor labor market outcomes (Gobillon et al., 2007), which in our case is related to higher labor informality.

Considering the main estimates in Table 2, we observe that the estimated coefficients for housing and labor informality are positive and statistically significant, indicating that higher levels of labor informality positively impact the emergence of informal housing and vice versa. This finding confirms that these two urban informality phenomena affect each other. To corroborate the presence of spatial dependence in this system of simultaneous equations, we compute the Lagrange Multiplier (LM)-lag and LM-error tests of spatial dependence (Mur et al., 2010). According to the results of these tests, which are reported at the bottom of Table 2, both simple LM tests are significant, suggesting the presence of spatial dependence in the model. The robust LM tests, for their part, help us understand what type of spatial dependence may be at work. The results show that the LM-lag robust to spatial dependence in error is still significant, while the LM-error robust to spatial dependence in lag becomes insignificant, indicating that when the spatial dependence of the error is considered, the spatial autocorrelation through the lagged dependent variable disappears.

Furthermore, we assessed the correlation between the error terms of the equations using the standard Breusch-Pagan test, which is employed to test the diagonality of the variance-covariance matrix of the disturbances across equations (Baltagi, 2011). As observed in the results presented in Table 2, the Breusch-Pagan test is statistically significant (in all the models), which rejects the null hypothesis of zero correlation between the two equations. This means it is important to consider the presence of cross-equation correlations in the error terms when estimating the system of equations related to informality.

Columns 3 to 10 of Table 2 report the estimates of the spatial models. The results for all models consistently indicate that labor informality has a positive and statistically significant effect on informal housing, suggesting that precarious labor conditions are a crucial driver of precarious housing conditions. Moreover, the estimates reveal that informal housing positively and significantly affects labor informality. In this regard, several studies have highlighted the negative effects of low-quality housing on multiple factors of individual and family well-being, including physical and mental health, stress indicators, family interactions, and work-related aspects (Cattaneo et al., 2009; Field, 2005; Galiani et al., 2017). In the case of the latter, the literature has shown that inadequate housing conditions are associated with a higher susceptibility to illness, increased time spent on cleaning tasks, reduced labor productivity, limited time available for work, and a greater propensity to engage in low-quality jobs (Cattaneo et al., 2009). These findings confirm the bidirectional causality between these two types of urban informality at the intraurban level in Medellín.

Since the point estimates do not account for the presence of indirect effects associated with the spatial structure of the models, we estimated both the direct and indirect effects, along with their statistical significance in the SLM and the SAC model (see Table 4 in the Appendix) following LeSage and Pace (2009), Elhorst (2010) and Halleck Vega and Elhorst (2015). The estimated direct effects of labor informality on informal housing reaffirm our previous findings that informal work activities are critical to the emergence of informal settlements. As noted in the literature review, the low wages earned by informal workers and their inability to document their income exclude them from accessing the financial system and formal home ownership opportunities, these workers are forced to rely on informal housing as a form of alternative housing.

In the SLM and the SAC model, we observed that the indirect effect (or spillover effect) of labor informality on informal housing is not statistically significant, indicating that the level of informal housing in one region is not affected by the level of labor informality in neighboring regions. This result confirms the highly localized relationship between informal work activities and informal settlements, implying that many poor workers may find shelter and work opportunities within the slum economy. This is consistent with the findings of Marx et al. (2013) and van Ham and Manley (2015), who reported that although slum economies may offer a greater range of services and work opportunities, they face structural constraints that lead to less formal housing and slum growth.

Conclusions

This study provides evidence of a positive relationship between informal housing and informal work activities at the intraurban level in Medellín (Colombia). Additionally, the descriptive analysis revealed that these two types of urban informality are simultaneously spatially persistent phenomena, with pronounced spatial clustering in the peripheral and marginalized areas of the city. According to these results, urban informality occurs locally and is concentrated in poor and peripheral areas.

Moreover, the estimated econometric models, which consider the spatial coexistence of these phenomena and specific unobservable local characteristics that simultaneously affect both types of informality, showed that labor informality positively affects the emergence of informal housing. This positive relationship indicates that slums may be the only housing solution for informal workers within a city, given the employment conditions that exclude them from accessing formal housing options (Burdett & Sudjic, 2011; Cavalcanti, 2019). Furthermore, individuals engaged in informal employment may find it more expensive to reside in formal housing rather than in informal housing. As a result, a significant proportion of low-income individuals involved in informal activities may be drawn to live in marginalized neighborhoods, thereby contributing to their expansion.

Regarding the spillover effects of labor informality on informal housing, the results indicate that the level of informal work in surrounding neighborhoods does not affect the level of informal housing. This may suggest a highly localized relationship between informal housing and labor informality due to the peculiarities of the informal work carried out within homes, leading to precarious housing conditions. This observation may support the idea that the lack of adequate transportation infrastructure in informal peripheries hinders access for a large part of the low-income population to areas where formal employment is concentrated. Hence, poor informal workers tend to concentrate in informal peripheral suburban belts and engage in more decentralized activities than formal workers (Posada & Moreno-Monroy, 2019; Suárez et al., 2016).

Our empirical findings highlight the importance of considering the working conditions of informal settlement dwellers when designing housing solutions for socio-economically disadvantaged citizens living in precarious conditions in peripheral areas. In this regard, within the Sustainable Development Goals (SDGs) framework, strategies have been developed to guarantee access to affordable housing financing options for informal workers. In addition, improving the likelihood of securing formal jobs is expected to reduce urban informality significantly.