Building a Data Platform for Cross-Country Urban Health Studies: the SALURBAL Study

Studies examining urban health and the environment must ensure comparability of measures across cities and countries. We describe a data platform and process that integrates health outcomes together with physical and social environment data to examine multilevel aspects of health across cities in 11 Latin American countries. We used two complementary sources to identify cities with ≥ 100,000 inhabitants as of 2010 in Argentina, Brazil, Chile, Colombia, Costa Rica, El Salvador, Guatemala, Mexico, Nicaragua, Panama, and Peru. We defined cities in three ways: administratively, quantitatively from satellite imagery, and based on country-defined metropolitan areas. In addition to “cities,” we identified sub-city units and smaller neighborhoods within them using census hierarchies. Selected physical environment (e.g., urban form, air pollution and transport) and social environment (e.g., income, education, safety) data were compiled for cities, sub-city units, and neighborhoods whenever possible using a range of sources. Harmonized mortality and health survey data were linked to city and sub-city units. Finer georeferencing is underway. We identified 371 cities and 1436 sub-city units in the 11 countries. The median city population was 234,553 inhabitants (IQR 141,942; 500,398). The systematic organization of cities, the initial task of this platform, was accomplished and further ongoing developments include the harmonization of mortality and survey measures using available sources for between country comparisons. A range of physical and social environment indicators can be created using available data. The flexible multilevel data structure accommodates heterogeneity in the data available and allows for varied multilevel research questions related to the associations of physical and social environment variables with variability in health outcomes within and across cities. The creation of such data platforms holds great promise to support researching with greater granularity the field of urban health in Latin America as well as serving as a resource for the evaluation of policies oriented to improve the health and environmental sustainability of cities.


Introduction
By 2050, at least 70% of the world's population will live in cities [1]. Urban policies impact important determinants of health, health equity, and environmental sustainability [2]. However, there is limited empirical evidence on what factors may make some cities healthier, more equitable, or more environmentally sustainable than others [3][4][5][6]. Latin America, with over 80% of its population living in urban areas [1] and a diversity of geographies and socioeconomic circumstances, presents a unique opportunity to study the impacts of urban living on health.
Cities in Latin America are heterogeneous in size; have diverse physical, social, and economic environments; and are frequently characterized by large social inequalities [3,7]. Cities of the region have also generated innovations in transportation, urban redevelopment, food policies, and social programs [8][9][10][11][12]. The SALURBAL (Salud Urbana en America Latina/Urban Health in Latin America) project launched in 2017 aims to leverage the heterogeneity and innovation observed across Latin American cities to study drivers of urban health, health equity, and environmental sustainability in order to inform urban policies worldwide [13].
A critical need in any cross-city comparison study is the creation of a data platform that can support betweenand within-city comparisons and that can be flexibly linked to various types of data defined at different levels of aggregation [14][15][16][17]. In this paper, we (1) describe the design of the SALURBAL data structure, including how cities are operationalized; (2) summarize the approach to obtaining and harmonizing health data; (3) describe priority social and physical environment indicators; (4) provide examples of how the data structure can be used to answer meaningful research questions about within and between-city variation in health; and (5) discuss selected challenges in creating this resource. Our goal is to inform similar data compilation efforts in other regions in order to enhance the ability to understand drivers of urban health and the impact of various urban policies on health.

A Flexible Multilevel Data Structure
Conducting within-city and cross-city comparisons of urban health necessitates: (1) identifying the universe of Bcities^; (2) operationalizing cities and geographic subunits within cities including neighborhoods in ways that permit linkages to available health and environmental data; (3) obtaining, processing, and harmonizing health data as well as data on social and physical environments; and (4) integrating all available information within a multilevel data structure that allows definition and measurement of constructs and investigation of questions at different levels. SALURBAL developed a data structure that accommodates information available for different geographic units and allows for heterogeneity, both geographically and over time. The process was guided by the principle that pragmatic albeit imperfect geographic definitions would be necessary to advance the project and that these definitions could be refined as the project progresses. The data structure developed allows for complementary analytical approaches that may be used to varying extents as the project evolves.

Identifying and Operationalizing Cities
There is no unique way to define a city, but there are at least three possible types of definitions: (1) administrative definitions based on political or administrative boundaries; (2) definitions based on social or economic functions, such as country-defined metropolitan areas, that capture interconnectedness between a core city and nearby areas; and (3) definitions based on the geographic extent of urban areas identified from satellite imagery using standardized criteria [14][15][16][18][19][20].
An advantage of administrative definitions of cities is that they can be linked to administrative and political responsibility and are often easy to link to health data. A disadvantage is that in large urban areas administratively defined cities often only capture a core city and may not fully represent the entire urban agglomeration. [21,22] Functional definitions such as metropolitan areas better capture the urban agglomeration around administratively defined core cities and have the important advantage of being based on social and economic relations between the core city and its surrounding areas. There are two broad types of functional definitions for these agglomerations. A first definition is based on networks, like water or road networks, while the second definition is based on travel patterns, which define labor or commute areas that are economically linked. Functional definitions receive a variety of names across different countries (e.g., metropolitan areas or urban agglomerations). Considerations of these broader geographic areas may be important to understand the drivers of urban health and the impact of urban health policies. However, these areas are defined using different criteria in different countries making cross-country comparisons difficult and may in some cases include surrounding areas that may not be thought of as urban [15,16].
Definitions based on geographic extent of built-up areas characterize the physical footprint of the city. An important strength of this approach is that it can be applied systematically across countries and over time to track urban growth longitudinally. In addition it captures the boundaries of urbanized areas in a systematic and data-driven fashion [14,19,23,24]. A key disadvantage is that it may be difficult to link other data such as census data or health data to these units because the boundaries identified do not necessarily correspond to any type of administrative area.

SALURBAL Approach to Identifying and Operationalizing Cities
Recognizing the complexity of defining cities and the need to be rigorous but practical in order to capitalize on easily available health data, SALURBAL used an approach that combines various criteria. First, we identified the universe of cities of interest. Second, we operationalized cities and their component units so that various data sources could be linked to them. We used a three-level tiered system to define cities and their subunits. We labeled Bcities^as Blevel 1,^sub-city components as Blevel 2,^and neighborhoods as Blevel 3.F irst Step: Identifying the Universe of SALURBAL Cities The project identified Bcities^with ≥ 100,000 inhabitants as of 2010 in the 11 SALURBAL countries as the universe of interest (here we use the term Bcitiesî n quotes broadly to refer to units that may be an urban agglomeration or some form of administratively defined cities). The countries currently included in the SALURBAL cities platform are Argentina (AR), Brazil (BR), Chile (CL), Colombia (CO), Costa Rica (CR), El Salvador (SV), Guatemala (GT), Mexico (MX), Nicaragua (NI), Panama (PA), and Peru (PE). A cut-off population size of 100,000 inhabitants was selected because it is a threshold often used to define cities and allows the inclusion of Bcities^of varying size [14-16, 20, 25]. Not all Bcities^will be included in all analyses as there will likely be important heterogeneity in the data available to answer a given research question, but identifying the universe is critical to provide context for results.
We created a draft list of Bcities^with 100,000 inhabitants or more by combining information from two sources: The 2010 Atlas of Urban Expansion (AUE) and a database of census data compiled at http://citypopulation. de (henceforth referred to as CP). The 2010 AUE [14] included 377 Bcities^determined to have 100,000 population or more in the 11 SALURBAL countries. Because the AUE defines cities approximately based on their built-up area (analogous to the third definition above), the Bcities^include both urban agglomerations (collections of nearby administratively defined areas) and single administratively defined cities. The CP is dedicated to collecting census data from countries worldwide, including lists of cities and other urban settlements. It is regularly updated with local population estimates [26]. Cities are defined based on a country's administrative definitions such as a municipality or Ba populated center, locality, or an urban area within a municipality.^The preferred year of population counts (or projections) was 2010 to match with the AUE population estimates. The CP list included 539 cities with population ≥ 100,000 in 2010 in the 11 SALURBAL countries.
We matched the AUE list of cities to the CP list by city name, country administrative sub-divisions, and country. All AUE-defined Bcities^had a match in the CP list, but not all cities in the CP list matched to an AUE Bcity.Ŝ atellite imagery in Google Earth (Google, Inc., Mountain View, California), NASA Earth Observatory Night Light Maps 2012 (NASA Worldview application, https://worldview.earthdata.nasa.gov/), and population data from both sources were used to assess whether the cities on the CP list that did not match the AUE list were actually already part of a larger AUE urban agglomeration. If an unmatched city was not part of an AUE defined city, it was added to the list. The final result was a consolidated list of Bcities^of ≥ 100,000 population that integrated information from both databases.
The draft list of Bcities^was reviewed by each country team for face validity resulting in a few minor modifications to the list. A few additional modifications to the list were made as a result of the operationalization of these Bcities^as clusters of smaller sub-city units (which we describe below further) and as a result of the comparison of this list to country-defined metropolitan areas. The full process used to arrive at the final list of 371 Bcities^is summarized in Fig. 1  L1UrbExt: based on the precise built-up urban extent identified systematically using satellite imagery; and (4) L1Excess: similar to L1UrbExt but including urban extents that spill over to neighboring non-SALURBAL countries, (for example Tijuana, Mexico's built area spilling into San Diego, USA). In addition to defining Bcities,^SALURBAL also defined sub-city units (level 2 or L2) and neighborhoods within cities (level 3 or L3).
A summary of the SALURBAL geographic definitions and Blevels^is provided in Table 1.
Defining L1 Administrative Units and Their Component Subunits In order to link city data with health data, it was critical to have a practical definition of Bcities^that could be operationalized as clusters of the smallest geographic units for which health data was either publicly available or easily available upon request (i.e., Fig. 2 Map of SALURBAL countries and cities without requiring georeferencing). We therefore identified the Blevel 2^units (L2) in each country as the geographic administrative units for which health data was easily available and then proceeded to link each Bcity^on our list to the corresponding L2 units. Some Bcities^encompassed only one L2 unit and others included multiple L2 units. In general, L2 units were defined as comunas, municipios, or similar units depending on the country. The cluster of L2 units that were attached to a given L1 was labeled the L1Admin.
A L2 unit was considered to be part of an L1Admin if it covered at least part of urban extent (initially determined by visual inspection of administrative boundaries and satellite imagery and then refined when the L1UrbExt was defined, see below). We included all L2 units that included any portion of the urban extent, even if they also captured areas outside the urban extent. In many cases, the population of the L2 unit will likely lie mostly within the most urbanized area. Subsequently, sensitivity analyses excluding L2 units that are not fully urban (based on census data) or that are only partly include the urban extent can be conducted. In cases where a L2 unit covered more than one Bcity,^it was assigned to the Bcity^with which it shared the largest amount of built-up area.
We identified neighborhoods or L3 units based on census hierarchies within each country. We looked for units that were comparable in size and that were nested within L2 units. L3 units facilitate examination of within-city variability when georeferenced health data are available and constitute building blocks for larger units (L2 units and L1UrbExt units) thus allowing linkage of these larger units to census and other data. In most countries, these units reflect the basic small-area census division for urban areas or for the entire country and were generally defined to facilitate census data collection. In some cases, the administrative units defined as L3 units did not cover the full country and were only available for country-defined Burban areas^(which may not coincide will SALURBAL L1Admin or L1UrbExt). In these cases, SALURBAL developed a strategy for creating SALURBAL defined L3 proxies in areas that were not covered. For details see Appendix Table 8. A summary of the definitions of L2 and L3 units for each country is provided in Table 2. A summary of the numbers of units at each level and their population sizes by country is provided in Table 3.
Defining BMetropolitan Areas^or L1Metro The second definition of Level 1 Bcities,^L1Metro, was based on each country's official definition of metropolitan areas (or similar areas). The definitions of L1Metro differed by country and are summarized in Appendix Table 9. L1Metro units may include multiple L1Admin units in their entirety or partially. In all countries except Argentina and Peru, L1Metro units are aggregates of L2 units. In Argentina, each L1Metro is composed of localidades and in Peru each L1Metro unit is composed of Centros Poblados. These units in both countries can be linked to L3 units. BCity^defined as a single administrative unit (e.g., municipio) or combination of adjacent administrative units (e.g., several municipios) that are part of the urban extent as determined from satellite imagery. Each L1Admin is defined based on its component level 2 units.

L1Metro (metropolitan areas)
BCity^defined following the exact definition that each country provides for metropolitan areas (if available), as a combination of either level 2 units or other units.

L1UrbExt (urban extent)
BCity^defined based on systematically identified urban extent based on built area; boundaries may not overlap exactly with administrative units.
L1Excess (urban extent spillover) BCity^defined as in L1UrbExt but also including the urban extent spilling into a neighboring non-SALURBAL country.
Level 2 Bsub-city^Administrative units (e.g., municipios) nested within L1Admin. In some cases, this may be a single unit for each city, and in other cases, it will be multiple units. In some cases, level 2 units may also be nested within L1Metro.
Level 3 Bneighborhood^Smaller units such as census tracts that can be used as proxies for Bneighborhoods^within a city. Level 3 units will be nested within level 2 units. They will also be approximately linked to L1UrbExt so that census data can be linked to the L1UrbExt for analyses. In some cases, level 3 units may also be nested within L1Metro.

Defining L1UrbExt and Its Spillover Extension
L1Excess While a qualitative assessment of the visual urban extent was used to help identify the L2 units linked to each L1Admin, a more refined, systematic, and quantitative approach was needed to properly define the urban extent of each L1 unit. This process used the Global Urban Footprint (GUF) Dataset [28,29] and followed procedures similar to those used by the Atlas of Urban Expansion to define urban extents with some modifications. The GUF is a worldwide mapping product derived using TerraSAR-X and TanDEM-X images, with a spatial resolution of 0.4 arcsec (~12 m), which classified pixels as built-up and non-built-up [28]. This classification was achieved by highlighting areas of images characterized by highly diverse and heterogeneous backscattering, then using an automated classifier, and followed by semiautomatic post processing. TerraSAR and TanDEM are two satellites designed to acquire high-resolution and good quality radar images covering the entire earth that are used for a wide range of applications, such as topographic mapping, land cover, and land use change detection [28][29][30]. In the process of defining urban extent, the pixels were identified as urban, suburban and rural according to the share of built-up pixels within a 1-km 2 area. Urban clusters were generated by merging the urban, suburban and urbanized open space. A hierarchical agglomerative process was used to join the urban clusters nearby following an inclusion rule. The largest urban cluster in each L1Admin was defined as L1UrbExt. The L1UrbExt analysis identified four potential cases that required further consideration, and if appropriate, modification of L1Admin definitions. First was when the L1UrbExt extended beyond the geographic boundaries of the L1Admin (as first defined using visual inspection of satellite imagery) and therefore the L1Admin needed to be modified by adding a L2 unit (3 cases). Second, when L1UrbExt extended beyond the geographic boundary of the L1Admin by less than 20% of the L1Admin area, in which case we ignored the extra area (3 cases). Third, when the L1UrbExt spills into another L1Admin, in which a case by case analysis identified that separate L1UrbExts were appropriate (2 cases) and no modifications to the L1Admins were made.
Fourth, when the L1UrbExt spilled into a neighboring non-SALURBAL country (10 cases, spilling into Paraguay, Uruguay, the USA, and Venezuela), we created the level 1 excess (L1Excess) to include the nonspillover plus the spillover area into the neighboring country. This was done because even though health data outside of SALURBAL countries would not be linked to the L1Admin, some measures of the L1UrbExt (such as air pollution) might be relevant to health on the other side of the border.

Linking Health and Environmental Data at Various Geographic Levels
A summary of the geographic hierarchies and possible linkages using the SALURBAL geographic levels is provided in Fig. 3. The L1Admin, level 2, and level 3 hierarchy is straightforward as units are nested within each other (Fig. 3). In many cases, L1Metros are also  clusters of L2 units, although they are sometimes larger and may encompass a different set of L2s than the L1Admins (Fig. 3). In countries where L1Metros are not defined using L2s (Argentina and Peru), they can be defined using L3s (Fig. 3a). L1UrbExts will be approximately linked to L3s (Fig. 3b). L3 units will be considered part of a L1UrbExt if they contain any portion of the area of the L1UrbExt. If necessary, weights may be used to attribute L3 data to the L1UrbExt in cases where the L3 is only partly covered by the L1UrbExt. A spatial representation of these linkages is shown in Fig. 3c. These data structures facilitate linkages of health and environmental data at various levels. They also allow for differences across data and countries in the spatial resolutions available. SALURBAL is in the process of georeferencing mortality and survey data to L3 whenever possible, thus allowing for analyses at finer spatial resolution. In the meantime, analyses based on L1Admins or L2s can proceed as aggregate data for these units is more readily available. The data structure proposed can be expanded to include time-varying health and environmental data linked to various geographic units. This is easily accomplished by adding calendar year indicators to spatial IDs. A challenge will be harmonizing units in cases where spatial definitions of administratively defined geographic units (such as L2 units, L3 units, or L1Metros) have changed over time. Definitions of L1UrbExts are designed to change over time in order to capture longitudinal changes in urban extent. If feasible, SALURBAL may explore approaches to harmonize geographic boundaries of selected units over time, as has been done in the USA [31,32].

Mortality Data
We obtained individual-level mortality records at L2 from each country (except Nicaragua) for as many years as possible. These records included at least age, sex, location of residence, and cause of death. Most countries had data on education of the decedent. We harmonized all variables to guarantee comparability. Sex was categorized as male, female, or missing. Age was operationalized in single-year intervals whenever possible (all countries except Colombia). Education was harmonized using the IPUMS international recode [33]. Causes of death were coded using either ICD9 or ICD10 codes (depending on the year) and grouped in categories using the World Health Organization Global Health Estimates (GHE) classification [34].
Three potential issues challenge the quality of mortality data, and we evaluated and addressed each one as follows. First, some mortality records have missing information on the variables of interest (age, sex, cause of death, location of residence, and education). To evaluate this issue, we computed missing data proportions for each variable by country and year (see Appendix Table 10). To impute these missing values, we used conditional probabilistic imputation by sex and cause of death (for age), by age and cause of death (for sex), and by age and sex (for cause of death), all stratified by country and year. For example, records with missing age or sex were imputed to a 5-year age category or to male or female probabilistically, based on the observed distributions of each variable in their corresponding sex and cause of death (for age) or age and cause of death (for sex). Records with missing cause of death were imputed to either ill-defined diseases or injuries of ill-defined intent (see below), probabilistically by age and sex. Mortality records with missing location of residence at L2 were dropped, as these would not be linkable to a SALURBAL area.
Second, some mortality records had a cause of death coded as an ill-defined disease (e.g., R chapter of the ICD10 classification) or as an injury of ill-defined intent (e.g., codes Y10-Y34 and Y872 in the ICD10 classification). We evaluated this issue by computing the proportion of all deaths that were coded as ill-defined diseases or injuries of ill-defined intent (see Appendix Table 10). Given that these ill-defined deaths make it challenging to estimate the public health burden of diseases and injuries, we redistributed them to other GHE categories proportionally by age, sex, country, and year. This approach is similar to that used by the GHE study [34].
Third, not all deaths that occur in a country are registered in a vital registrations system. The phenomenon of lack of complete coverage or undercounting biases down the estimates of mortality. We evaluated this issue by obtaining estimates of undercounting from the United Nations Development Program (see Appendix  Table 11). These estimates apply to the entire country, so we obtained more detailed estimates wherever possible. This is especially important in countries with wide geographic variability and high rates of undercounting such as Peru and Colombia, where (a) a national estimate of undercounting my underestimate or overestimate the lack of coverage and (b) this differentiation may be meaningful (as the overall rates are high). In countries where this distinction was less relevant, we applied a blanket correction for the entire country. Appendix Table 11 details the specific corrections we applied to each country, whether they are L2 specific (or at a higher level) and whether they are age or sex specific. Overall, we applied these correction factors by using them to estimate the number of missing deaths (for the entire country or each L2, for all age groups or a specific age group, and for both sexes or each specific gender, see Appendix Table 11). Once we estimated the number of missing deaths, we sampled this number with replacement (hot deck imputation) from the observed deaths following similar procedures as the GHE.
The final product was a collection of datasets with information on each individual mortality record, including year, country, location of residence (at L2), age (in single or 5-year groups), sex, education (if available), and cause of death (3 variables: ICD-10 code, GHE classification, and GHE classification with redistributed ill-defined diseases and injuries of ill-defined intent). Moreover, we created an aggregated dataset, summing the number of deaths in each year, L2, 5-year age category, sex, education (if available), and cause of death using the GHE classification (with and without applying the redistribution of ill-defined diseases and injuries of illdefined intent). These aggregated datasets contained both the number of deaths corrected for lack of complete coverage and the uncorrected number of observed deaths.

Population Data
In order to use mortality records to estimate mortality rates, we had to obtain estimates of the population counts by year, location of residence (L2), age, and sex. These population projections were obtained from the census bureaus of each country. In most countries, estimates by age and sex were available at L2. In some cases (Peru and El Salvador), estimates by age and sex were only available at higher administrative levels instead of L2, while data for L2 was available by either age or sex. In these cases, we estimated L2 population counts by age and sex by redistributing the counts by age or sex to the proportions observed at higher levels. More details are available in Appendix Table 12.
Survey Data SALURBAL plans to compile health surveys and any available cohort studies in order to develop harmonized measures of health behaviors and other risk factors. Our initial focus has been on national health surveys with a focus on non-communicable disease risk factors. The design and sampling approaches differ somewhat across countries, but all allow linkage to SALURBAL L2 units (and may in the future also allow linkages to L3 units). Some surveys are based only on self-report information, but others include objective measurements such as height, weight and blood pressure [35]. A data harmonization effort was launched to create comparable measures of selected domains. The design of the surveys implies that their geographic level or representativeness may differ (Appendix Table 13). This will be taken into consideration if prevalence estimates for specific cities are generated. In addition, we will use statistical approaches that can be leveraged to derive small area estimates even when the survey was not specifically designed for that purpose [36][37][38][39]. For the most part, however, survey data will be used in multilevel analyses to estimate associations of city or neighborhood-level factors with individual-level outcomes. Sampling design and weights will be taken into consideration, if appropriate, as has been done in prior work [40][41][42][43]. Appendix Table 13 summarizes methodological and geographic characteristics of surveys selected for initial harmonization.
SALURBAL developed a process for harmonization of priority domains that included the following: (1) identifying and collating questions and responses by domain, with attention to skip patterns and respondent universe; (2) reviewing surveys conducted by others such as the Centers for Disease Control and Prevention or the World Health Organization for standard variable definitions as well as harmonization approaches proposed by other projects [33,44,45]; (3) proposing harmonized variable definitions and response categories with attention to differences in wording across countries; and (4) applying the harmonization and revising the protocol as needed, based on descriptive statistics of initial harmonized variables. In some cases, multiple versions of a variable were created due to country differences that did not allow a single harmonized variable. The harmonized data will be linked to L2 and L3 whenever possible. In addition SALURBAL is exploring other methods to combine heterogeneous data across countries using approaches, such as differential item functioning [46], meta-analysis approaches [47,48], and fused LASSO models or other machine learning approaches [49]. Priority domains of interest and variable definitions are shown in Table 4. Other domains will be harmonized as the study advances.

Characterizing Urban Social and Physical Environments
Several key social and physical environment domains were identified as potentially relevant to health and health inequalities in cities by the SALURBAL team. The domains as well as selected indicators for these domains and the data sources that are being used to estimate them are summarized in Tables 5 and 6. Indicators may be defined for L3, L2 or L1Admin, L1Metro, and L1UrbExt based on the construct and data availability.

A Typology of Multilevel Urban Health Questions
The data structure created by SALURBAL can be flexibly used to answer a number of different types of research questions relevant to understanding the drivers of urban health in cities and the policies that may be most effective in improving population health and reducing health inequities. By capitalizing on heterogeneity across cities and within cities, we can identify important city-level and neighborhood-level drivers of variability in health and in health inequities thus obtaining clues on causes of population health and health inequities. The types of questions that can be explored with the data platform we developed include, for example (1) questions about factors associated with between-city differences in health; (2) questions about factors associated with within-city (neighborhood) differences in health; (3) questions about the impact of city context on inequities in health; and (4) longitudinal questions about factors associated with changes over time at the city or neighborhood level. By exploring these questions, we will obtain evidence important to identifying what strategies can be used by cities to promote health and health equity. A simplified typology of selected questions is shown in Table 7. Many additional possibilities will be possible.

Challenges
Data Availability, Heterogeneity, and Quality >Finding and obtaining the data necessary to answer important questions about environments and health in cities remains an important challenge. For example, mortality data at L2 have been generally easy to obtain, but health survey data have been more complicated to access, even for larger geographic areas, like L2 units. Social and physical environment data have to be compiled from multiple heterogeneous data sources with differences across countries in what information is available. Although many countries have rich health surveys, details on the wording of the questions and the skip patterns used can make harmonization difficult. Data quality also varies both within countries and between countries. The team has devised strategies to address quality issues whenever possible via evidence-based corrections (as described for the mortality data) or through sensitivity analyses.  Longitudinal Data A goal of the SALURBAL project is to be able to measure changes in the physical and social environment over time and their effect on health outcomes. Some countries will have more data going further back in time than others. While some data may be available going back 20 or 30 years or more, the quality of older data may not be suitable for the project or may not be available at the city or smaller spatial resolution levels; thus, some longitudinal analyses may not include all countries or all cities. Accommodating differences in spatial definitions of L1Admins and other units over time will also present important challenges. a Population for the urban extent (L1UrbExt) was estimated based on the ratio of built area in the urban extent to the total built area in each L2 unit. Estimated populations for each built-up L2 unit were then aggregated up to the L1UrbEx b Although we found that WorldPop's downscaled data performed poorly in a few cases, we assumed that WorldPop's relative concentration of population within a given unit would be representative of the actual population concentration. A measure of disagreement between WorldPop and Census data is included in our data to describe uncertainty in the Gini coefficient resulting from WorldPop population data c A patch is defined as a homogeneous region of a specific land cover type that differs from its surrounding d These air pollution measures are from air quality monitors maintained by local governments

Conclusion
The creation of this unique data platform presents enormous opportunities for research, capacity building, and policy impact and positions SALURBAL as an example of an integrated comprehensive approach to characterizing and studying the drivers of urban health in low and middle income countries. The flexible, multilevel data structure allows for heterogeneity in space and time at various scales and can accommodate data available with varying degrees of space and time resolution. Various geographic definitions of cities allow for flexibility in analyses depending on research questions and data availability. Additional health data spanning multiple types of health outcomes across multiple ages can be easily incorporated. The data resource will allow a number of analyses to identify factors related to health, health equity, and environmental sustainability of cities. In addition, it is a rich resource for capacity building in the region. The use and presentation of these data (with all its limitations) will necessarily spur improvements to the regional data systems. In addition, continuous updates to the data resources, including addition of other health outcomes across the lifecourse and the incorporation of data on the timing and characteristics of various policies implemented, will provide  opportunities for continuous policy impact evaluation into the future.  In these cases, a proxy level 3 was created by using the level 2.5 in its entirety or (in cases where the level 2.5 included areas with L3s defined) by defining the L3 proxy as the L2.5 units minus any area covered by the L3s. (Note that in Colombia, a Bsector rural^is available in non-urban areas but it sometimes includes sectores urbanos, which is why the approach of treating the sector rural as a L2.5 and subtracting L3s when appropriate to create an L3 proxy had to be used)  Nicaragua mortality data is not currently available at the necessary geographic to the SALURBAL study currently  Population center (centro poblado) or group of population centers with a contiguous urban area with over 500,000 inhabitants. A population center is defined as a group of inhabitants who are linked by economic, social, cultural, or historical factors. a These two types of entities for Brazil encompass different sets of non-overlapping cities b This includes legally organized metropolitan areas with political administrative structure (N = 6) and officially recognized metropolitan areas without legally organized political administrative structures (N = 9, referred to as both Bareas metropolitanas^and Baglomeraciones urbanas.^Population, economic, and other statistics are calculated for both types of areas by government organizations [27]    In Mexico, the households with the greatest deficiencies were identified through the construction of a defined social lag(rezago) index for the AGEBs; the index that was built is similar to the social lag (rezago) index built by the National Evaluation Council of the Social Development Policy for localities in 2005. https://www.coneval.org.mx/rw/resource/coneval/med_ pobreza/1024.pdf