Skip to main content

Building a Data Platform for Cross-Country Urban Health Studies: the SALURBAL Study


Studies examining urban health and the environment must ensure comparability of measures across cities and countries. We describe a data platform and process that integrates health outcomes together with physical and social environment data to examine multilevel aspects of health across cities in 11 Latin American countries. We used two complementary sources to identify cities with ≥ 100,000 inhabitants as of 2010 in Argentina, Brazil, Chile, Colombia, Costa Rica, El Salvador, Guatemala, Mexico, Nicaragua, Panama, and Peru. We defined cities in three ways: administratively, quantitatively from satellite imagery, and based on country-defined metropolitan areas. In addition to “cities,” we identified sub-city units and smaller neighborhoods within them using census hierarchies. Selected physical environment (e.g., urban form, air pollution and transport) and social environment (e.g., income, education, safety) data were compiled for cities, sub-city units, and neighborhoods whenever possible using a range of sources. Harmonized mortality and health survey data were linked to city and sub-city units. Finer georeferencing is underway. We identified 371 cities and 1436 sub-city units in the 11 countries. The median city population was 234,553 inhabitants (IQR 141,942; 500,398). The systematic organization of cities, the initial task of this platform, was accomplished and further ongoing developments include the harmonization of mortality and survey measures using available sources for between country comparisons. A range of physical and social environment indicators can be created using available data. The flexible multilevel data structure accommodates heterogeneity in the data available and allows for varied multilevel research questions related to the associations of physical and social environment variables with variability in health outcomes within and across cities. The creation of such data platforms holds great promise to support researching with greater granularity the field of urban health in Latin America as well as serving as a resource for the evaluation of policies oriented to improve the health and environmental sustainability of cities.


By 2050, at least 70% of the world’s population will live in cities [1]. Urban policies impact important determinants of health, health equity, and environmental sustainability [2]. However, there is limited empirical evidence on what factors may make some cities healthier, more equitable, or more environmentally sustainable than others [3,4,5,6]. Latin America, with over 80% of its population living in urban areas [1] and a diversity of geographies and socioeconomic circumstances, presents a unique opportunity to study the impacts of urban living on health.

Cities in Latin America are heterogeneous in size; have diverse physical, social, and economic environments; and are frequently characterized by large social inequalities [3, 7]. Cities of the region have also generated innovations in transportation, urban redevelopment, food policies, and social programs [8,9,10,11,12]. The SALURBAL (Salud Urbana en America Latina/Urban Health in Latin America) project launched in 2017 aims to leverage the heterogeneity and innovation observed across Latin American cities to study drivers of urban health, health equity, and environmental sustainability in order to inform urban policies worldwide [13].

A critical need in any cross-city comparison study is the creation of a data platform that can support between- and within-city comparisons and that can be flexibly linked to various types of data defined at different levels of aggregation [14,15,16,17]. In this paper, we (1) describe the design of the SALURBAL data structure, including how cities are operationalized; (2) summarize the approach to obtaining and harmonizing health data; (3) describe priority social and physical environment indicators; (4) provide examples of how the data structure can be used to answer meaningful research questions about within and between-city variation in health; and (5) discuss selected challenges in creating this resource. Our goal is to inform similar data compilation efforts in other regions in order to enhance the ability to understand drivers of urban health and the impact of various urban policies on health.

A Flexible Multilevel Data Structure

Conducting within-city and cross-city comparisons of urban health necessitates: (1) identifying the universe of “cities”; (2) operationalizing cities and geographic subunits within cities including neighborhoods in ways that permit linkages to available health and environmental data; (3) obtaining, processing, and harmonizing health data as well as data on social and physical environments; and (4) integrating all available information within a multilevel data structure that allows definition and measurement of constructs and investigation of questions at different levels. SALURBAL developed a data structure that accommodates information available for different geographic units and allows for heterogeneity, both geographically and over time. The process was guided by the principle that pragmatic albeit imperfect geographic definitions would be necessary to advance the project and that these definitions could be refined as the project progresses. The data structure developed allows for complementary analytical approaches that may be used to varying extents as the project evolves.

Identifying and Operationalizing Cities

There is no unique way to define a city, but there are at least three possible types of definitions: (1) administrative definitions based on political or administrative boundaries; (2) definitions based on social or economic functions, such as country-defined metropolitan areas, that capture interconnectedness between a core city and nearby areas; and (3) definitions based on the geographic extent of urban areas identified from satellite imagery using standardized criteria [14,15,16, 18,19,20].

An advantage of administrative definitions of cities is that they can be linked to administrative and political responsibility and are often easy to link to health data. A disadvantage is that in large urban areas administratively defined cities often only capture a core city and may not fully represent the entire urban agglomeration. [21, 22]

Functional definitions such as metropolitan areas better capture the urban agglomeration around administratively defined core cities and have the important advantage of being based on social and economic relations between the core city and its surrounding areas. There are two broad types of functional definitions for these agglomerations. A first definition is based on networks, like water or road networks, while the second definition is based on travel patterns, which define labor or commute areas that are economically linked. Functional definitions receive a variety of names across different countries (e.g., metropolitan areas or urban agglomerations). Considerations of these broader geographic areas may be important to understand the drivers of urban health and the impact of urban health policies. However, these areas are defined using different criteria in different countries making cross-country comparisons difficult and may in some cases include surrounding areas that may not be thought of as urban [15, 16].

Definitions based on geographic extent of built-up areas characterize the physical footprint of the city. An important strength of this approach is that it can be applied systematically across countries and over time to track urban growth longitudinally. In addition it captures the boundaries of urbanized areas in a systematic and data-driven fashion [14, 19, 23, 24]. A key disadvantage is that it may be difficult to link other data such as census data or health data to these units because the boundaries identified do not necessarily correspond to any type of administrative area.

SALURBAL Approach to Identifying and Operationalizing Cities

Recognizing the complexity of defining cities and the need to be rigorous but practical in order to capitalize on easily available health data, SALURBAL used an approach that combines various criteria. First, we identified the universe of cities of interest. Second, we operationalized cities and their component units so that various data sources could be linked to them. We used a three-level tiered system to define cities and their subunits. We labeled “cities” as “level 1,” sub-city components as “level 2,” and neighborhoods as “level 3.”

First Step: Identifying the Universe of SALURBAL Cities

The project identified “cities” with ≥ 100,000 inhabitants as of 2010 in the 11 SALURBAL countries as the universe of interest (here we use the term “cities” in quotes broadly to refer to units that may be an urban agglomeration or some form of administratively defined cities). The countries currently included in the SALURBAL cities platform are Argentina (AR), Brazil (BR), Chile (CL), Colombia (CO), Costa Rica (CR), El Salvador (SV), Guatemala (GT), Mexico (MX), Nicaragua (NI), Panama (PA), and Peru (PE). A cut-off population size of 100,000 inhabitants was selected because it is a threshold often used to define cities and allows the inclusion of “cities” of varying size [14,15,16, 20, 25]. Not all “cities” will be included in all analyses as there will likely be important heterogeneity in the data available to answer a given research question, but identifying the universe is critical to provide context for results.

We created a draft list of “cities” with 100,000 inhabitants or more by combining information from two sources: The 2010 Atlas of Urban Expansion (AUE) and a database of census data compiled at (henceforth referred to as CP). The 2010 AUE [14] included 377 “cities” determined to have 100,000 population or more in the 11 SALURBAL countries. Because the AUE defines cities approximately based on their built-up area (analogous to the third definition above), the “cities” include both urban agglomerations (collections of nearby administratively defined areas) and single administratively defined cities. The CP is dedicated to collecting census data from countries worldwide, including lists of cities and other urban settlements. It is regularly updated with local population estimates [26]. Cities are defined based on a country’s administrative definitions such as a municipality or “a populated center, locality, or an urban area within a municipality.” The preferred year of population counts (or projections) was 2010 to match with the AUE population estimates. The CP list included 539 cities with population ≥ 100,000 in 2010 in the 11 SALURBAL countries.

We matched the AUE list of cities to the CP list by city name, country administrative sub-divisions, and country. All AUE-defined “cities” had a match in the CP list, but not all cities in the CP list matched to an AUE “city.” Satellite imagery in Google Earth (Google, Inc., Mountain View, California), NASA Earth Observatory Night Light Maps 2012 (NASA Worldview application,, and population data from both sources were used to assess whether the cities on the CP list that did not match the AUE list were actually already part of a larger AUE urban agglomeration. If an unmatched city was not part of an AUE defined city, it was added to the list. The final result was a consolidated list of “cities” of ≥ 100,000 population that integrated information from both databases.

The draft list of “cities” was reviewed by each country team for face validity resulting in a few minor modifications to the list. A few additional modifications to the list were made as a result of the operationalization of these “cities” as clusters of smaller sub-city units (which we describe below further) and as a result of the comparison of this list to country-defined metropolitan areas. The full process used to arrive at the final list of 371 “cities” is summarized in Fig. 1 and shown geographically in Fig. 2.

Fig. 1
figure 1

The process used to identity “cities” in 11 SALURBAL countries. Footnotes: (a) During the operationalization of cities as clusters of L2 units (see section on definition of L1Admin), it was observed that some L1 “cities” shared contiguous built-up areas. This resulted in adjacent L1 units being combined with other L1 units (N = 4) to create a consolidated “city”. Additionally, some administrative cities with populations of less than 100,000 were observed to share contiguous built-up areas with other nearby administrative cities such that together these units met the population eligibility requirement. This resulted in the addition of a small number (N = 4) of L1 units. (b) As a result of comparing the list of cities with what some countries deem as “metropolitan areas,” 3 new L1 units were added and 17 were merged with other L1 units. (c) MA = metropolitan areas

Fig. 2
figure 2

Map of SALURBAL countries and cities

Second Step: Creating Complementary Operational Definitions of “Cities” and Subunits Within Them

SALURBAL created four complementary definitions of “cities” or level 1 units: (1) L1Admin: based on the built-up urban extent approximated through clusters of administratively defined areas; (2) L1Metro: based on country specific definitions of metropolitan areas; (3) L1UrbExt: based on the precise built-up urban extent identified systematically using satellite imagery; and (4) L1Excess: similar to L1UrbExt but including urban extents that spill over to neighboring non-SALURBAL countries, (for example Tijuana, Mexico’s built area spilling into San Diego, USA). In addition to defining “cities,” SALURBAL also defined sub-city units (level 2 or L2) and neighborhoods within cities (level 3 or L3). A summary of the SALURBAL geographic definitions and “levels” is provided in Table 1.

Table 1 SALURBAL definitions of cities and their component units at various levels

Defining L1 Administrative Units and Their Component Subunits

In order to link city data with health data, it was critical to have a practical definition of “cities” that could be operationalized as clusters of the smallest geographic units for which health data was either publicly available or easily available upon request (i.e., without requiring georeferencing). We therefore identified the “level 2” units (L2) in each country as the geographic administrative units for which health data was easily available and then proceeded to link each “city” on our list to the corresponding L2 units. Some “cities” encompassed only one L2 unit and others included multiple L2 units. In general, L2 units were defined as comunas, municipios, or similar units depending on the country. The cluster of L2 units that were attached to a given L1 was labeled the L1Admin.

A L2 unit was considered to be part of an L1Admin if it covered at least part of urban extent (initially determined by visual inspection of administrative boundaries and satellite imagery and then refined when the L1UrbExt was defined, see below). We included all L2 units that included any portion of the urban extent, even if they also captured areas outside the urban extent. In many cases, the population of the L2 unit will likely lie mostly within the most urbanized area. Subsequently, sensitivity analyses excluding L2 units that are not fully urban (based on census data) or that are only partly include the urban extent can be conducted. In cases where a L2 unit covered more than one “city,” it was assigned to the “city” with which it shared the largest amount of built-up area.

We identified neighborhoods or L3 units based on census hierarchies within each country. We looked for units that were comparable in size and that were nested within L2 units. L3 units facilitate examination of within-city variability when georeferenced health data are available and constitute building blocks for larger units (L2 units and L1UrbExt units) thus allowing linkage of these larger units to census and other data. In most countries, these units reflect the basic small-area census division for urban areas or for the entire country and were generally defined to facilitate census data collection. In some cases, the administrative units defined as L3 units did not cover the full country and were only available for country-defined “urban areas” (which may not coincide will SALURBAL L1Admin or L1UrbExt). In these cases, SALURBAL developed a strategy for creating SALURBAL defined L3 proxies in areas that were not covered. For details see Appendix Table 8. A summary of the definitions of L2 and L3 units for each country is provided in Table 2. A summary of the numbers of units at each level and their population sizes by country is provided in Table 3.

Table 2 SALURBAL cities and definitions of Level 2 and 3 units by country
Table 3 Descriptive statistics and population sizes of L1Admin, L2, and L3 units by country. Population for L1Admin and L2 are from 2010 census projections from each country. L3 population sizes are from most recent census data available

Defining “Metropolitan Areas” or L1Metro

The second definition of Level 1 “cities,” L1Metro, was based on each country’s official definition of metropolitan areas (or similar areas). The definitions of L1Metro differed by country and are summarized in Appendix Table 9. L1Metro units may include multiple L1Admin units in their entirety or partially. In all countries except Argentina and Peru, L1Metro units are aggregates of L2 units. In Argentina, each L1Metro is composed of localidades and in Peru each L1Metro unit is composed of Centros Poblados. These units in both countries can be linked to L3 units.

Defining L1UrbExt and Its Spillover Extension L1Excess

While a qualitative assessment of the visual urban extent was used to help identify the L2 units linked to each L1Admin, a more refined, systematic, and quantitative approach was needed to properly define the urban extent of each L1 unit. This process used the Global Urban Footprint (GUF) Dataset [28, 29] and followed procedures similar to those used by the Atlas of Urban Expansion to define urban extents with some modifications. The GUF is a worldwide mapping product derived using TerraSAR-X and TanDEM-X images, with a spatial resolution of 0.4 arcsec (~ 12 m), which classified pixels as built-up and non-built-up [28]. This classification was achieved by highlighting areas of images characterized by highly diverse and heterogeneous backscattering, then using an automated classifier, and followed by semi-automatic post processing. TerraSAR and TanDEM are two satellites designed to acquire high-resolution and good quality radar images covering the entire earth that are used for a wide range of applications, such as topographic mapping, land cover, and land use change detection [28,29,30]. In the process of defining urban extent, the pixels were identified as urban, suburban and rural according to the share of built-up pixels within a 1-km2 area. Urban clusters were generated by merging the urban, suburban and urbanized open space. A hierarchical agglomerative process was used to join the urban clusters nearby following an inclusion rule. The largest urban cluster in each L1Admin was defined as L1UrbExt.

The L1UrbExt analysis identified four potential cases that required further consideration, and if appropriate, modification of L1Admin definitions. First was when the L1UrbExt extended beyond the geographic boundaries of the L1Admin (as first defined using visual inspection of satellite imagery) and therefore the L1Admin needed to be modified by adding a L2 unit (3 cases). Second, when L1UrbExt extended beyond the geographic boundary of the L1Admin by less than 20% of the L1Admin area, in which case we ignored the extra area (3 cases). Third, when the L1UrbExt spills into another L1Admin, in which a case by case analysis identified that separate L1UrbExts were appropriate (2 cases) and no modifications to the L1Admins were made.

Fourth, when the L1UrbExt spilled into a neighboring non-SALURBAL country (10 cases, spilling into Paraguay, Uruguay, the USA, and Venezuela), we created the level 1 excess (L1Excess) to include the non-spillover plus the spillover area into the neighboring country. This was done because even though health data outside of SALURBAL countries would not be linked to the L1Admin, some measures of the L1UrbExt (such as air pollution) might be relevant to health on the other side of the border.

Linking Health and Environmental Data at Various Geographic Levels

A summary of the geographic hierarchies and possible linkages using the SALURBAL geographic levels is provided in Fig. 3. The L1Admin, level 2, and level 3 hierarchy is straightforward as units are nested within each other (Fig. 3). In many cases, L1Metros are also clusters of L2 units, although they are sometimes larger and may encompass a different set of L2s than the L1Admins (Fig. 3). In countries where L1Metros are not defined using L2s (Argentina and Peru), they can be defined using L3s (Fig. 3a). L1UrbExts will be approximately linked to L3s (Fig. 3b). L3 units will be considered part of a L1UrbExt if they contain any portion of the area of the L1UrbExt. If necessary, weights may be used to attribute L3 data to the L1UrbExt in cases where the L3 is only partly covered by the L1UrbExt. A spatial representation of these linkages is shown in Fig. 3c. These data structures facilitate linkages of health and environmental data at various levels. They also allow for differences across data and countries in the spatial resolutions available. SALURBAL is in the process of georeferencing mortality and survey data to L3 whenever possible, thus allowing for analyses at finer spatial resolution. In the meantime, analyses based on L1Admins or L2s can proceed as aggregate data for these units is more readily available.

Fig. 3
figure 3

a Links between L1Admin, L2, L3, and L1Metro. The L1Metro may or may not overlap with the level 2 units that compose the L1Admin and may or may not include L2 units outside of the L1Admin. Depending on the country, the L1Metro may include all L3 units within L2’s or only selected L3 units within them. b Links between L1Admin, level 2, level 3, and L1UrbExt. The L1UrbExt may include subsets of L3 units within the L1Admin. In a small number of cases a variant of the L1UrbExt that extends outside the boundaries of the country (and the L1Admin) was created and called L1Excess. c Spatial representation of links between L1Admin, L2, L3, and L1UrbExt. L1Metro is not shown but may include L2s or L3s beyond the L1Admin or may encompass only part of the L1Admin.

The data structure proposed can be expanded to include time-varying health and environmental data linked to various geographic units. This is easily accomplished by adding calendar year indicators to spatial IDs. A challenge will be harmonizing units in cases where spatial definitions of administratively defined geographic units (such as L2 units, L3 units, or L1Metros) have changed over time. Definitions of L1UrbExts are designed to change over time in order to capture longitudinal changes in urban extent. If feasible, SALURBAL may explore approaches to harmonize geographic boundaries of selected units over time, as has been done in the USA [31, 32].

Obtaining and Harmonizing Health Data

Mortality Data

We obtained individual-level mortality records at L2 from each country (except Nicaragua) for as many years as possible. These records included at least age, sex, location of residence, and cause of death. Most countries had data on education of the decedent. We harmonized all variables to guarantee comparability. Sex was categorized as male, female, or missing. Age was operationalized in single-year intervals whenever possible (all countries except Colombia). Education was harmonized using the IPUMS international recode [33]. Causes of death were coded using either ICD9 or ICD10 codes (depending on the year) and grouped in categories using the World Health Organization Global Health Estimates (GHE) classification [34].

Three potential issues challenge the quality of mortality data, and we evaluated and addressed each one as follows. First, some mortality records have missing information on the variables of interest (age, sex, cause of death, location of residence, and education). To evaluate this issue, we computed missing data proportions for each variable by country and year (see Appendix Table 10). To impute these missing values, we used conditional probabilistic imputation by sex and cause of death (for age), by age and cause of death (for sex), and by age and sex (for cause of death), all stratified by country and year. For example, records with missing age or sex were imputed to a 5-year age category or to male or female probabilistically, based on the observed distributions of each variable in their corresponding sex and cause of death (for age) or age and cause of death (for sex). Records with missing cause of death were imputed to either ill-defined diseases or injuries of ill-defined intent (see below), probabilistically by age and sex. Mortality records with missing location of residence at L2 were dropped, as these would not be linkable to a SALURBAL area.

Second, some mortality records had a cause of death coded as an ill-defined disease (e.g., R chapter of the ICD10 classification) or as an injury of ill-defined intent (e.g., codes Y10–Y34 and Y872 in the ICD10 classification). We evaluated this issue by computing the proportion of all deaths that were coded as ill-defined diseases or injuries of ill-defined intent (see Appendix Table 10). Given that these ill-defined deaths make it challenging to estimate the public health burden of diseases and injuries, we redistributed them to other GHE categories proportionally by age, sex, country, and year. This approach is similar to that used by the GHE study [34].

Third, not all deaths that occur in a country are registered in a vital registrations system. The phenomenon of lack of complete coverage or undercounting biases down the estimates of mortality. We evaluated this issue by obtaining estimates of undercounting from the United Nations Development Program (see Appendix Table 11). These estimates apply to the entire country, so we obtained more detailed estimates wherever possible. This is especially important in countries with wide geographic variability and high rates of undercounting such as Peru and Colombia, where (a) a national estimate of undercounting my underestimate or overestimate the lack of coverage and (b) this differentiation may be meaningful (as the overall rates are high). In countries where this distinction was less relevant, we applied a blanket correction for the entire country. Appendix Table 11 details the specific corrections we applied to each country, whether they are L2 specific (or at a higher level) and whether they are age or sex specific. Overall, we applied these correction factors by using them to estimate the number of missing deaths (for the entire country or each L2, for all age groups or a specific age group, and for both sexes or each specific gender, see Appendix Table 11). Once we estimated the number of missing deaths, we sampled this number with replacement (hot deck imputation) from the observed deaths following similar procedures as the GHE.

The final product was a collection of datasets with information on each individual mortality record, including year, country, location of residence (at L2), age (in single or 5-year groups), sex, education (if available), and cause of death (3 variables: ICD-10 code, GHE classification, and GHE classification with redistributed ill-defined diseases and injuries of ill-defined intent). Moreover, we created an aggregated dataset, summing the number of deaths in each year, L2, 5-year age category, sex, education (if available), and cause of death using the GHE classification (with and without applying the redistribution of ill-defined diseases and injuries of ill-defined intent). These aggregated datasets contained both the number of deaths corrected for lack of complete coverage and the uncorrected number of observed deaths.

Population Data

In order to use mortality records to estimate mortality rates, we had to obtain estimates of the population counts by year, location of residence (L2), age, and sex. These population projections were obtained from the census bureaus of each country. In most countries, estimates by age and sex were available at L2. In some cases (Peru and El Salvador), estimates by age and sex were only available at higher administrative levels instead of L2, while data for L2 was available by either age or sex. In these cases, we estimated L2 population counts by age and sex by redistributing the counts by age or sex to the proportions observed at higher levels. More details are available in Appendix Table 12.

Survey Data

SALURBAL plans to compile health surveys and any available cohort studies in order to develop harmonized measures of health behaviors and other risk factors. Our initial focus has been on national health surveys with a focus on non-communicable disease risk factors. The design and sampling approaches differ somewhat across countries, but all allow linkage to SALURBAL L2 units (and may in the future also allow linkages to L3 units). Some surveys are based only on self-report information, but others include objective measurements such as height, weight and blood pressure [35]. A data harmonization effort was launched to create comparable measures of selected domains. The design of the surveys implies that their geographic level or representativeness may differ (Appendix Table 13). This will be taken into consideration if prevalence estimates for specific cities are generated. In addition, we will use statistical approaches that can be leveraged to derive small area estimates even when the survey was not specifically designed for that purpose [36,37,38,39]. For the most part, however, survey data will be used in multilevel analyses to estimate associations of city or neighborhood-level factors with individual-level outcomes. Sampling design and weights will be taken into consideration, if appropriate, as has been done in prior work [40,41,42,43]. Appendix Table 13 summarizes methodological and geographic characteristics of surveys selected for initial harmonization.

SALURBAL developed a process for harmonization of priority domains that included the following: (1) identifying and collating questions and responses by domain, with attention to skip patterns and respondent universe; (2) reviewing surveys conducted by others such as the Centers for Disease Control and Prevention or the World Health Organization for standard variable definitions as well as harmonization approaches proposed by other projects [33, 44, 45]; (3) proposing harmonized variable definitions and response categories with attention to differences in wording across countries; and (4) applying the harmonization and revising the protocol as needed, based on descriptive statistics of initial harmonized variables. In some cases, multiple versions of a variable were created due to country differences that did not allow a single harmonized variable. The harmonized data will be linked to L2 and L3 whenever possible. In addition SALURBAL is exploring other methods to combine heterogeneous data across countries using approaches, such as differential item functioning [46], meta-analysis approaches [47, 48], and fused LASSO models or other machine learning approaches [49]. Priority domains of interest and variable definitions are shown in Table 4. Other domains will be harmonized as the study advances.

Table 4 SALURBAL health survey domains and selected measures.

Characterizing Urban Social and Physical Environments

Several key social and physical environment domains were identified as potentially relevant to health and health inequalities in cities by the SALURBAL team. The domains as well as selected indicators for these domains and the data sources that are being used to estimate them are summarized in Tables 5 and 6. Indicators may be defined for L3, L2 or L1Admin, L1Metro, and L1UrbExt based on the construct and data availability.

Table 5 Social environment domains and indicators
Table 6 Physical and built environment domains and indicators

A Typology of Multilevel Urban Health Questions

The data structure created by SALURBAL can be flexibly used to answer a number of different types of research questions relevant to understanding the drivers of urban health in cities and the policies that may be most effective in improving population health and reducing health inequities. By capitalizing on heterogeneity across cities and within cities, we can identify important city-level and neighborhood-level drivers of variability in health and in health inequities thus obtaining clues on causes of population health and health inequities.

The types of questions that can be explored with the data platform we developed include, for example (1) questions about factors associated with between-city differences in health; (2) questions about factors associated with within-city (neighborhood) differences in health; (3) questions about the impact of city context on inequities in health; and (4) longitudinal questions about factors associated with changes over time at the city or neighborhood level. By exploring these questions, we will obtain evidence important to identifying what strategies can be used by cities to promote health and health equity. A simplified typology of selected questions is shown in Table 7. Many additional possibilities will be possible.

Table 7 A typology of selected urban health questions that can be investigated with the SALURBAL data platform


Data Availability, Heterogeneity, and Quality

>Finding and obtaining the data necessary to answer important questions about environments and health in cities remains an important challenge. For example, mortality data at L2 have been generally easy to obtain, but health survey data have been more complicated to access, even for larger geographic areas, like L2 units. Social and physical environment data have to be compiled from multiple heterogeneous data sources with differences across countries in what information is available. Although many countries have rich health surveys, details on the wording of the questions and the skip patterns used can make harmonization difficult. Data quality also varies both within countries and between countries. The team has devised strategies to address quality issues whenever possible via evidence-based corrections (as described for the mortality data) or through sensitivity analyses.

Spatial Resolution

The informativeness of health data is maximized if the data can be georeferenced. Currently, most SALURBAL data are available at L1Admin and L2, though each country team is advancing efforts to geocode mortality, live births, and health data to at least L3. The challenges of georeferencing have included coming to agreement with appropriate government institutions, selecting a method for georeferencing and a high-quality source of geocoding while maintaining confidentiality, and obtaining the appropriate geodatabases of the geographic boundaries of the L3 or smaller units.

Longitudinal Data

A goal of the SALURBAL project is to be able to measure changes in the physical and social environment over time and their effect on health outcomes. Some countries will have more data going further back in time than others. While some data may be available going back 20 or 30 years or more, the quality of older data may not be suitable for the project or may not be available at the city or smaller spatial resolution levels; thus, some longitudinal analyses may not include all countries or all cities. Accommodating differences in spatial definitions of L1Admins and other units over time will also present important challenges.


The creation of this unique data platform presents enormous opportunities for research, capacity building, and policy impact and positions SALURBAL as an example of an integrated comprehensive approach to characterizing and studying the drivers of urban health in low and middle income countries. The flexible, multilevel data structure allows for heterogeneity in space and time at various scales and can accommodate data available with varying degrees of space and time resolution. Various geographic definitions of cities allow for flexibility in analyses depending on research questions and data availability. Additional health data spanning multiple types of health outcomes across multiple ages can be easily incorporated. The data resource will allow a number of analyses to identify factors related to health, health equity, and environmental sustainability of cities. In addition, it is a rich resource for capacity building in the region. The use and presentation of these data (with all its limitations) will necessarily spur improvements to the regional data systems. In addition, continuous updates to the data resources, including addition of other health outcomes across the lifecourse and the incorporation of data on the timing and characteristics of various policies implemented, will provide opportunities for continuous policy impact evaluation into the future.


  1. 1.

    Population Division of the Department of Economic and Social Affairs. 2017 Revision of World Population Prospects: United Nations. New York, NY; 2017.

  2. 2.

    Singh S, Beagley J. Health and the New Urban Agenda: a mandate for action. Lancet. 2017;389(10071):801–2.

    Article  PubMed  Google Scholar 

  3. 3.

    United Nations Human Settlements Programme (UN-Habitat). Urbanization and Development: Emerging Futures - World Cities Report 2016: United Nations. Nairobi, Kenya; 2016. HS/038/16E.

  4. 4.

    Galea S, Freudenberg N, Vlahov D. Cities and population health. Soc Sci Med. Mar 2005;60(5):1017–33.

    Article  PubMed  Google Scholar 

  5. 5.

    Harpham T. Urban health in developing countries: what do we know and where do we go? Health Place. 2009;15(1):107–16.

    Article  PubMed  Google Scholar 

  6. 6.

    Rydin Y, Bleahu A, Davies M, Dávila JD, Friel S, de Grandis G, et al. Shaping cities for health: complexity and the planning of urban environments in the 21st century. Lancet. Jun 2012;379(9831):2079–108.

    Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Tsounta E, Osueke A. What is behind Latin America’s declining income inequality?; 2014. IMF Working Paper. Available at: Accessed 23 March 2018.

  8. 8.

    World Development book case study: sustainable urban development in Curitiba. 2016. Available at: Accessed 23 March 2018.

  9. 9.

    Cerda M, Morenoff JD, Hansen BB, et al. Reducing violence by transforming neighborhoods: a natural experiment in Medellin, Colombia. Am J Epidemiol. 2012;175(10):1045–53.

    Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Gomez LF, Sarmiento R, Ordoñez MF, et al. Urban environment interventions linked to the promotion of physical activity: a mixed methods study applied to the urban context of Latin America. Social Science & Medicine. 2015;131:18–30.

    Article  Google Scholar 

  11. 11.

    Jirón P. Sustainable Urban Mobility in Latin America. Nairobi: UN Habitat; 2013. Accessed 23 March 2018.

  12. 12.

    Mc GJ. Radical cities: across Latin America in search of a new architecture. London: Verso Books; 2015.

    Google Scholar 

  13. 13.

    Diez Roux AV, Slesinski SC, Alazraqui M, et al. A novel international partnership for actionable evidence on urban health in latin America: LAC-urban health and SALURBAL. Global Challenges. 2018;0(0):1800013.

  14. 14.

    Angel S, Blei AM, Parent J, et al. Atlas of urban expansion—2016 edition, Volume 1: Areas and Densities, New York: New York University, Nairobi: UN- Habitat, and Cambridge, MA: Lincoln Institute of Land Policy; 2016.

  15. 15.

    Frey WH, Zimmer Z. Defining the city. In: Paddison R, editor. Handbook of urban studieS. London: Sage Publications; 2001. p. 14–35.

    Chapter  Google Scholar 

  16. 16.

    Parr JB. Spatial definitions of the city: four perspectives. Urban Stud. 2007;44(2):381–92.

    Article  Google Scholar 

  17. 17.

    Beenackers MA, Doiron D, Fortier I, et al. MINDMAP: establishing an integrated database infrastructure for research in ageing, mental well-being, and the urban environment. BMC Public Health. 2018;18(1):158.

    Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Population Division of the Department of Economic and Social Affairs. The world’s cities in 2016 - data booklet. New York, New York: United Nations; 2016.

  19. 19.

    Fang C, Yu D. Urban agglomeration: an evolving concept of an emerging phenomenon. Landsc Urban Plann. 2017;162:126–36.

    Article  Google Scholar 

  20. 20.

    European Environment Agency. Urban atlas. Copenhagen: European Union; 2010.

  21. 21.

    Cottineau C, Hatna E, Arcaute E, Batty M. Diverse cities or the systematic paradox of urban scaling laws. Computers, Environment Urban Syst. 2017;63:80–94.

    Article  Google Scholar 

  22. 22.

    Arcaute E, Hatna E, Ferguson P, Youn H, Johansson A, Batty M. Constructing cities, deconstructing scaling laws. J R Soc Interface. 2015;12(102):20140745.

  23. 23.

    Keola S, Andersson M, Hall O. Monitoring economic development from space: using nighttime light and land cover data to measure economic growth. World Development. 2015;66:322–34.

    Article  Google Scholar 

  24. 24.

    Potere D, Schneider A, Angel S, Civco DL. Mapping urban areas on a global scale: which of the eight maps now available is more accurate? Int J Remote Sens. 2009;30(24):6531–58.

    Article  Google Scholar 

  25. 25.

    Dirección Corporativa de Análisis Económico y Conocimiento para el Desarrollo. Crecimiento urbano y acceso a oportunidades: un desafío para América Latina. Bogota: CAF; 2017.

  26. 26.

    Brinkhoff T. City population. Available at: Accessed 1 March, 2017.

  27. 27.

    Patiño Villa CA, Zambrano Pantoja F, García Mora D, Hernández Fernández HA. Debates de Gobierno Urbano es una publicación seriada del Instituto de Estudios Urbanos de la Universidad Nacional de Colombia, Sede Bogotá. Bogota DC, Colombia: Instituto de Estudios Urbanos; 2016.

    Google Scholar 

  28. 28.

    Esch T, Thiel M, Schenk A, Roth A, Muller A, Dech S. Delineation of urban footprints from TerraSAR-X data by analyzing speckle characteristics and intensity information. IEEE Trans Geosci Remote Sens. 2010;48(2):905–16.

    Article  Google Scholar 

  29. 29.

    Esch T, Heldens W, Hirner A, et al. Breaking new ground in mapping human settlements from space—the Global Urban Footprint. ISPRS J Photogramm Remote Sens. 2017;134:30–42.

    Article  Google Scholar 

  30. 30.

    Taubenböck H, Esch T, Felbier A, Wiesner M, Roth A, Dech S. Monitoring urbanization in mega cities from space. Remote Sens Environ. 2012;117:162–76.

    Article  Google Scholar 

  31. 31.

    Logan JR, Xu Z, Stults BJ. Interpolating U.S. decennial census tract data from as early as 1970 to 2010: a longitudinal tract database. Prof Geogr. 2014;66(3):412–20.

    Article  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Logan JR, Stults BJ, Xu Z. Validating population estimates for harmonized census tract data, 2000–2010. Ann Am Assoc Geogr. 2016;106(5):1013–29.

    PubMed  PubMed Central  Google Scholar 

  33. 33.

    Minnesota Population Center. Integrated Public Use Microdata Series, International: Version 7.0 Minneapolis, MN; 2018.

  34. 34.

    Division of Evidence and Research - Department of Information. WHO methods and data sources for country-level causes of death 2000–2015. Geneva: World Health Organization; 2017.

  35. 35.

    Mindell JS, Moody A, Vecino-Ortiz AI, Alfaro T, Frenz P, Scholes S, et al. Comparison of health examination survey methods in Brazil, Chile, Colombia, Mexico, England, Scotland, and the United States. Am J Epidemiol. 2017;186(6):648–58.

    Article  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Song L, Mercer L, Wakefield J, Laurent A, Solet D. Using small-area estimation to calculate the prevalence of smoking by subcounty geographic areas in King County, Washington, Behavioral Risk Factor Surveillance System, 2009-2013. Prev Chronic Dis. 2016;13:E59.

    Article  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Zhang X, Holt JB, Lu H, et al. Multilevel regression and poststratification for small-area estimation of population health outcomes: a case study of chronic obstructive pulmonary disease prevalence using the Behavioral Risk Factor Surveillance System. Am J Epidemiol. 2014;179(8):1025–33.

    Article  PubMed  Google Scholar 

  38. 38.

    Zhang X, Holt JB, Yun S, Lu H, Greenlund KJ, Croft JB. Validation of multilevel regression and poststratification methodology for small area estimation of health indicators from the Behavioral Risk Factor Surveillance System. Am J Epidemiol. 2015;182(2):127–37.

    Article  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Zhang Z, Zhang L, Penman A, May W. Using small-area estimation method to calculate county-level prevalence of obesity in Mississippi, 2007–2009. Prev Chronic Dis. 2011;8(4):A85.

    PubMed  PubMed Central  Google Scholar 

  40. 40.

    Chen C, Wakefield J, Lumely T. The use of sampling weights in Bayesian hierarchical models for small area estimation. Spatial Spatio-temporal Epidemiol. 2014;11(Supplement C):33–43.

    Article  Google Scholar 

  41. 41.

    Mercer L, Wakefield J, Chen C, Lumley T. A comparison of spatial smoothing methods for small area estimation with sampling weights. Spatial Statist. 2014;8(Supplement C):69–85.

    Article  Google Scholar 

  42. 42.

    Rabe-Hesketh S, Skrondal A. Multilevel modelling of complex survey data. J R Stat Soc Ser A (Statistics in Society). 2006;169(4):805–27.

    Article  Google Scholar 

  43. 43.

    Lovasi GS, Fink DS, Mooney SJ, Link BG. Model-based and design-based inference goals frame how to account for neighborhood clustering in studies of health in overlapping context types. SSM - Population Health. 2017;3(Supplement C):600–8.

    Article  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Jeffers K, King M, Cleveland L, Kelly HP. Data resource profile: IPUMS-International. Int J Epidemiol. 2017;46(2):390–1.

    PubMed  PubMed Central  Google Scholar 

  45. 45.

    Centers for Disease Control and Prevention. Behavioral Risk Factor Surveillance System. US Department of Health and Human Services. Available at: Accessed 23 March 2018.

  46. 46.

    Teresi JA, Fleishman JA. Differential item functioning and health assessment. Qual Life Res. 2007;16(1):33–42.

    Article  PubMed  Google Scholar 

  47. 47.

    Smith SJ, Steinberg KK, Thacker SB. Methods for pooled analyses of epidemiologic studies. Epidemiology. 1994;5(3):381.

    Article  CAS  PubMed  Google Scholar 

  48. 48.

    Smith-Warner SA, Spiegelman D, Ritz J, Albanes D, Beeson WL, Bernstein L, et al. Methods for pooling results of epidemiologic studiesthe pooling project of prospective studies of diet and cancer. Am J Epidemiol. 2006;163(11):1053–64.

    Article  PubMed  Google Scholar 

  49. 49.

    Tang L, Song PXK. Fused lasso approach in regression coefficients clustering - learning parameter heterogeneity in data integration. J Mach Learn Res. 2016;17:1–23.

  50. 50.

    Centers for Disease Control and Prevention. Indicator definitions—diabetes. Atlanta, GA: US Department of Human and Health Services; 2018.

    Google Scholar 

  51. 51.

    Diabetes Programme. Global report on diabetes. Geneva, Switzerland: World Health Organization; 2016.

  52. 52.

    Dept. of Noncommunicable Disease Surveillance. Definition, diagnosis and classification of diabetes mellitus and its complications : report of a WHO consultation. Part 1, Diagnosis and classification of diabetes mellitus. Geneva, Switzerland: World Health Organization; 1999.

  53. 53.

    Centers for Disease Control and Prevention. Indicator definitions—cardiovascular disease. Atlanta, GA: US Department of Human and Health Services; 2018.

    Google Scholar 

  54. 54.

    Zhou B, Bentham J, Di Cesare M, et al. Worldwide trends in blood pressure from 1975 to 2015: a pooled analysis of 1479 population-based measurement studies with 19·1 million participants. Lancet. 2017;389(10064):37–55.

    Article  Google Scholar 

  55. 55.

    Norm C, Pedro O, JM G, et al. Implementing standardized performance indicators to improve hypertension control at both the population and healthcare organization levels. J Clin Hypertens. 2017;19(5):456–61.

    Article  Google Scholar 

  56. 56.

    OECD. Health at a glance 2017: OECD indicators. Paris: OECD Publishing; 2017.

  57. 57.

    Centers for Disease Control and Prevention. Surveillance of certain health behaviors and conditions among states and selected local areas―Behavioral Risk Factor Surveillance System (BRFSS). MMWR Surveillance Summary. 2008;57(SS–7):2–3,11.

  58. 58.

    Centers for Disease Control and Prevention. Indicator definitions—tobacco. Atlanta, GA: US Department of Human and Health Services; 2018.

    Google Scholar 

  59. 59.

    Global Tobacco Surveillance System (GTSS). Global adult tobacco survey (GATS) indicator guidelines: definition and Syntax. Geneva, Switzerland: World Health Organization; 2009.

  60. 60.

    Centers for Disease Control and Prevention. Indicator definitions—alcohol. Atlanta, GA: US Department of Human and Health Services; 2018.

    Google Scholar 

  61. 61.

    World Health Organization. Global status report on alcohol and health. Geneva: WHO; 2011.

    Google Scholar 

  62. 62.

    World Health Organization. BMI classification. World Health Organization. Available at: Accessed 10 May, 2018.

  63. 63.

    International Physical Activity Questionnaire Group. Guidelines for Data Processing and Analysis of the International Physical Activity Questionnaire (IPAQ) – Short and Long Forms. Available at: Accessed 23 March 2018.

  64. 64.

    Prevention of Noncommunicable Diseases Department SaP-BP. Global Physical Activity Questionnaire (GPAQ) Analysis Guide. World Health Organization. Available at: Accessed 23 March 2018.

  65. 65.

    World Health Organization. Measuring the intake of fruit and vegetables. World Health Organization. Available at: Accessed 23 March 2018.

  66. 66.

    National Cancer Institute. Dietary Assessment Primer: Food Frequency Questionnaire at a Glance. Available at: Accessed 10 May, 2018.

  67. 67.

    Centers for Disease Control and Prevention. Indicator definitions—nutrition, physical activity, and weight status. Atlanta, GA: US Department of Human and Health Services; 2018.

    Google Scholar 

  68. 68.

    CAF Banco de Desarollo de América Latina. Encuesta CAF 2016. Available at: Accessed 23 March 2018.

  69. 69.

    Wardrop NA, Jochem WC, Bird TJ, Chamberlain HR, Clarke D, Kerr D, et al. Spatially disaggregated population estimates in the absence of national population and housing census data. Proc Natl Acad Sci. 2018;115:3529–37.

    Article  CAS  PubMed  Google Scholar 

  70. 70.

    Boeing G. OSMnx: new methods for acquiring, constructing, analyzing, and visualizing complex street networks. Comput Environ Urban Syst. 2017;65:126–39.

    Article  Google Scholar 

  71. 71.

    van Donkelaar A, Martin RV, Brauer M, et al. Global estimates of fine particulate matter using a combined geophysical-statistical method with information from satellites, models, and monitors. Environ Sci Technol. 2016;50(7):3762–72.

    Article  CAS  PubMed  Google Scholar 

  72. 72.

    Boys BL, Martin RV, van Donkelaar A, et al. Fifteen-year global time series of satellite-derived fine particulate matter. Environ Sci Technol. 2014;48(19):11109–18.

    Article  CAS  PubMed  Google Scholar 

  73. 73.

    Av D, Martin RV, Brauer M, Boys BL. Use of satellite observations for long-term exposure assessment of global concentrations of fine particulate matter. Environ Health Perspect. 2015;123(2):135–43.

    Article  CAS  Google Scholar 

Download references


The SALURBAL Group includes Marcio Alazraqui, Hugo Spinelli, Carlos Guevel, Vanessa Di Cecco, Adela Tisnés, Carlos Leveau, Adrián Santoro, and Damián Herkovits: National University of Lanus, Buenos Aires, Argentina; Nelson Gouveia: Universidad de São Paulo, São Paulo, Brazil; Mauricio Barreto and Gervásio Santos: Oswaldo Cruz Foundation, Salvador Bahia, Brazil; Leticia Cardoso, Mariana Carvalho de Menezes, and Maria de Fatima de Pina: Oswaldo Cruz Foundation, Rio de Janeiro, Brazil; Waleska Teixeira Caiaffa, Amélia Augusta de Lima Friche, and Amanda Cristina de Souza Andrade: Universidade Federal de Minas Gerais, Belo Horizonte, Brazil; Patricia Frenz, Tania Alfaro, Cynthia Córdova, Pablo Ruiz, and Mauricio Fuentes: School of Public Health, University of Chile, Santiago, Chile; Alejandra Vives Vergara, Alejandro Salazar, Andrea Cortinez-O’Ryan, Cristián Schmitt, Francisca Gonzalez, Fernando Baeza, and Flavia Angelini: Department of Public Health, Pontificia Universidad Católica de Chile, Santiago, Chile; Olga Lucía Sarmiento Dueñas, Diana Higuera, and Catalina González: School of Medicine, Universidad de los Andes, Bogotá, Colombia; Felipe Montes, Andres F. Useche, Oscar Guaje, Ana Maria Jaramillo, and Luis Angel Guzmán: School of Engineering, Universidad de los Andes, Bogotá, Colombia. Philipp Hessel and Diego Lucumi: School of Government, Universidad de los Andes, Bogotá, Colombia; Jose David Meisel: Universidad de Ibagué, Ibagué, Colombia; Eliana Martinez: Universidad de Antioquia, Medellín, Colombia; María F. Kroker-Lobos, Manuel Ramirez-Zea, and Kevin Martinez Folger: INCAP Research Center for the Prevention of Chronic Diseases (CIIPEC), Institute of Nutrition of Central America and Panama (INCAP), Guatemala City, Guatemala; Tonatiuh Barrientos-Gutierrez, Carolina Perez-Ferrer, Javier Prado-Galbarro, Filipa de Castro, and Rosalba Rojas-Martínez: Instituto Nacional de Salud Pública, Mexico City, Mexico; J. Jaime Miranda, Akram Hernández Vásquez, and Francisco Diez-Canseco: School of Medicine, Universidad Peruana Cayetano Heredia, Lima, Peru; Ross Hammond: Brookings Institute, Washington, D.C., USA; Daniel Rodriguez and Iryna Dronova: Department of City and Regional Planning, the University of California Berkeley, USA; Brisa N. Sanchez: University of Michigan School of Public Health, Ann Arbor, Michigan, USA; Peter Hovmand: Washington University in St. Louis, St. Louis, Missouri, USA; Ricardo Jordán Fuchs and Juliet Braslow: Economic Commission for Latin America and the Caribbean (ECLAC); Jose Siri: United Nations University International Institute for Global Health (UNU-IIGH); Ana Diez Roux, Amy Auchincloss, Brent Langellier, Gina Lovasi, Leslie McClure, Yvonne Michael, Harrison Quick, D. Alex Quistberg, Jose Tapia Granados, Kari Moore, Felipe Garcia-España, Usama Bilal, and Ivana Stankov: Dornsife School of Public Health, Drexel University, Philadelphia, Pennsylvania, USA; Salud Urbana en América Latina (SALURBAL), Urban Health in Latin America, is a 5-year project that studies how urban environments and urban policies impact the health of city residents throughout Latin America. SALURBAL’s findings inform policies and interventions to create healthier, more equitable, and more sustainable cities worldwide. SALURBAL is funded by the Wellcome Trust [205177/Z/16/Z]. More information about the project can be found at

Research Support

This project was supported by the Wellcome Trust initiative, “Our Planet, Our Health” (Grant 205177/Z/16/Z).

Author information




Corresponding author

Correspondence to Ana V. Diez Roux.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Table 8 Name, definition, and size of SALURBAL level 3 units by country
Table 9 Definition of L1Metros and component subunits for each country. Component unit (L2 unit when possible or other) is italicized in each definition. Definitions and number of units are based on census data closest to 2010
Table 10 Missing data and ill-defined deaths for mortality data in SALURBAL cities in 10 countries. Data corresponds to the latest available year for every country
Table 11 Undercounting estimates and specificity of correction approaches for mortality data by country
Table 12 Population projections data sources and characteristics for use as denominators with mortality data.
Table 13 Health risk factor and chronic diseases surveys from SALURBAL countries used in the initial stage of harmonization

Rights and permissions

OpenAccess This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Quistberg, D.A., Diez Roux, A.V., Bilal, U. et al. Building a Data Platform for Cross-Country Urban Health Studies: the SALURBAL Study. J Urban Health 96, 311–337 (2019).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Urban health
  • Latin America
  • Cities
  • Built environment
  • Social Environment
  • Multilevel Models
  • Mortality
  • Health Survey