Background

Most studies of physical activity and built environments have been conducted in the USA, Australia and Western Europe, with recent studies extending findings to Japan [1], Colombia [2], China [3], Brazil [4], and elsewhere [5, 6]. Though there have been important consistencies in the results [7], it is not possible to interpret different patterns of association by country because common methods were not employed. Further, the limited variability in environmental exposures and physical activity within countries may have underestimated the strength of association [7]. International evidence about the associations of the built environment with physical activity could inform international and national policies and guide the implementation of international health strategies, such as those from the World Health Organization [8]. Only international studies using comparable methods can establish the extent to which environment and policy associations with physical activity are generalizable across country or are country-specific. Such findings could inform evidence-based international and country-specific interventions to increase physical activity that could help underpin initiatives on the prevention of obesity and other non-communicable diseases that are high in developed countries and growing rapidly in developing countries [9, 10].

One of the main aims of the IPEN (International Physical Activity and Environment Network) Adult project is to conduct multi-country pooled analyses using comparable measures to estimate associations of perceived attributes of the neighborhood environment with physical activity and health-related outcomes across 12 countries [11]. The Neighborhood Environment Walkability Scale (NEWS) [12] and its abbreviated form (NEWS-A) [13] were selected as measures of perceived neighborhood characteristics hypothesized to be related to physical activity, especially walking (e.g., land use mix, street connectivity, and traffic safety). The NEWS and NEWS-A have been adapted and/or translated for use in various countries [1421]. Both the original and adapted/translated versions have shown acceptable test-retest reliability, concurrent validity with respect to objective environmental measures, and some evidence of criterion validity with respect to physical activity outcomes [12, 1518, 2124].

To date, four studies have established measurement models of factor analyzable items of the original [13, 25] and adapted versions of the NEWS and/or NEWS-A [16, 17] based on Confirmatory Factor Analyses (CFA). The CFA models describe the patterns of associations between items and their underlying latent constructs (e.g., street connectivity or aesthetics), thereby providing recommendations on how to summarize and score participants’ responses on the subscales [13, 25]. Specifically, CFA can evaluate the extent to which responses to questionnaire items (e.g., perceived high crime rate, feeling unsafe to walk in the neighborhood during the day, or at night) that are hypothesized to measure the same construct, aka latent factor (e.g., safety from crime), share common variance. For each questionnaire item, CFA yields standardized factor loadings that indicate the magnitude and direction of associations between the responses on the items and their underlying latent construct. For example, a standardized factor loading of −0.85 for the item “perceived high crime rate” on the latent factor of crime safety would indicate that its responses are strongly negatively correlated with the factor.

All four studies that conducted a CFA of the NEWS/NEWS-A used a two-stage cluster sampling strategy to recruit participants from selected areas (i.e., selected study areas from which participants were recruited), they distinguished individual- from area-level measurement models - the former based on within-area differences and the latter on between-area differences in responses to the individual items. Because individual-level measurement models are more reflective of how perceptions of environmental characteristics group into factors and, thus, are likely more generalizable across populations than their area-level counterparts, it has been suggested that the NEWS and NEWS-A be scored according to the individual-level models [13, 16, 17, 25]. The full and abbreviated versions of these instruments showed similar individual-level measurement models including the following multi-item latent factors: land use mix - access to services; street connectivity; infrastructure for walking/cycling; aesthetics; traffic safety; and safety from crime. Yet, several between-study discrepancies were noted, which might be attributed to differences in items or somewhat limited generalizability of specific measurement models to other geographical or cultural settings [25].

A prerequisite for conducting pooled analyses of multi-country data is the use of common protocols, including comparable exposure and outcome measures. In the case of the IPEN Adult project, a requirement for a country’s inclusion was measurement of perceived neighborhood attributes using the NEWS or NEWS-A, representing one of the main exposure measures. However, IPEN Adult was not a multi-center study that was funded at the outset with all countries required to follow an exact protocol. To optimize resources, some IPEN countries were able to receive local funding and proceed with their study before a funded coordinating center was in place to implement tight quality control. This funding model enabled more countries to contribute data, strengthened the study, and allowed countries some level of flexibility in matching the protocol to the local context, thereby making the study more relevant to their national situation. However, the downside was lack of comparability in some study elements, including the set of NEWS items used. Hence, the aim of the present paper was to compare subsets of comparable NEWS/NEWS-A items used across the 12 IPEN countries and, based on empirical evidence on their CFA-derived individual-level measurement models, propose scoring protocols that maximize cross-country comparability of responses. CFA-based NEWS/NEWS-A scores with demonstrated comparability across countries could then be used for either pooled analyses combining countries, or for study-specific non-pooled analyses, thereby facilitating both cross-study and cross-country comparisons. This information is not only important to studies included in the IPEN initiative, it is also highly relevant to other researchers who used or will use the NEWS/NEWS-A, which are currently the most popular measures of perceived neighborhood environment worldwide. Additionally, this paper also proposes a relatively simple analytical approach that can be used to create comparable measures for multi-country pooled analyses, when some deviations in the measurement protocol exist across study sites.

Methods

Neighborhood selection

The IPEN Adult study is an observational epidemiologic multi-country cross-sectional study. Twelve countries participated: Australia, Belgium, Brazil, Colombia, Czech Republic, Denmark, Hong Kong, Mexico, New Zealand, Spain, the United Kingdom, and the United States. Study participants were selected from neighborhoods chosen to maximize the variance in neighborhood walkability and Socio-Economic Status (SES); this occurred in all countries except Spain, where neighborhood SES was not available. The goal of the study design was to have equal numbers of neighborhoods stratified as follows: high walkable/high SES, high walkable/low SES, low walkable/high SES, and low walkable/low SES. For selection of study neighborhoods, all countries except Spain used a neighborhood walkability index that was objectively defined using Geographic Information Systems data at the smallest administrative unit available. A neighborhood walkability index for the whole area of study was first developed [26]. Then, neighborhoods with relatively lower and higher walkability index scores by lower and higher SES indicators were selected. In nine countries, participants were recruited across the seasons to control for variations in weather that may affect physical activity. In six countries, participants were recruited equally across the neighborhoods by season. The details for each country can be found elsewhere [11].

Recruitment and participants

The required recruitment strategy was systematic selection of participants with addresses in the chosen neighborhoods. Adults living in the selected neighborhoods were contacted and invited to complete surveys on their physical activity and perceptions of the environment. Study dates ranged from 2002 to 2011. Each country obtained ethical approval from their local institutions and all participants provided informed consent. Age ranges for recruitment ranged from 15–84 years. Four countries recruited participants by phone and mail, and eight of the studies contacted households in person. Databases of resident addresses from commercial and government sources were used for the phone and mail recruitment. For the in-person recruitment, standard procedures for identifying households and participants within a household were employed [27]. In Hong Kong, intercept interviews were conducted in residential areas where individual addresses were not available, for example, in large apartment buildings with restricted access. Six countries used monetary incentives, and four countries provided non-monetary incentives including feedback on physical activity [28]. Six countries employed self-report methods (mail and online surveys) to collect survey data, four countries used interviews, and two countries used both self-report and interview methods. Further details for the participant recruitment techniques and response rates across countries can be found elsewhere [11].

There was a total N = 14,309 participants, ranging from 512 – 2,650 individuals from each country (see Table 1). The mean participants’ age was 42.3 (SD = 12.9) years. Overall, 57.1% were women, 38% had a high school degree, 43.9% a college degree, and 59.6% were married or living with a partner. Demographic descriptive statistics for each country are also shown in Table 1.

Table 1 Overall and country-specific sample characteristics

Measures

Versions of the neighborhood environment walkability scale

General overview

The full and abbreviated versions of the NEWS and NEWS-A comprise 67 and 54 items, respectively [12, 13]. They gauge the following perceived neighborhood attributes: (1) residential density; (2) land use mix – diversity; (3) land use mix – access; (4) street connectivity; (5) infrastructure and safety for walking; (6) aesthetics; (7) traffic safety; (8) safety from crime; (9) streets not having many cul-de-sacs; (10) physical barriers to walking; (11) parking difficult in local shopping areas; and (12) hilly streets in the neighborhood. The United States employed the original full NEWS; New Zealand used the NEWS-A; while the remaining 10 countries used various combinations of NEWS/NEWS-A items, in their original or slightly modified forms. All countries included at least some items gauging the first 10 neighborhood attributes listed above. All non-English versions of the instrument were forward-translated from English into the local language, culturally adapted (when needed), and back-translated into English. At least two expert raters reviewed all versions of the NEWS/NEWS-A and evaluated item content equivalence.

Subscales (original and adapted)

The Residential density subscale of the original NEWS and NEWS-A consists of six items rated on a 5-point scale (1 = none; 2 = a few; 3 = some; 4 = most; 5 = all) (Table 2). Eleven out of 12 countries used the original response scale, while Belgium used a 3-point scale (1 = none; 2 = some; 3 = many). For the purpose of this study, responses on this subscale were recoded to range from 0 to 4 (0 to 3 for Belgium; i.e., 0 = none; 2 = some; 3 = most). This was done to enhance the accuracy of the measure so that perceived absence of a specific type of density-related attribute (e.g., apartments or condos with 4–6 stories) would not positively contribute to the total residential density score. Six countries used all six original items. The remaining countries reduced the number of items by merging the content of adjacent items or by omitting those gauging the highest levels of residential density. Finally, Hong Kong added an item to account for extreme levels of residential density (high-rise buildings with more than 20 stories) (Table 2). Ratings on the original six items of this subscale are weighted relative to the average residential densities that they represent, these being 1, 12, 10, 25, 50, and 75, respectively [12]. Table 2 describes modifications to the scoring procedures of the original version of this subscale that will be adopted to make it comparable across IPEN countries (see column Proposed solutions). A summary residential density score is obtained by summing up all weighted items’ scores.

Table 2 Differences in NEWS/NEWS-A items* across 12 countries and proposed solutions to maximize comparability across countries

The Land use mixdiversity subscale of the original NEWS and NEWS-A is assessed by the perceived walking proximity from home to 23 different types of destinations, with responses ranging from 1–5 minute walking distance (coded as 5, indicative of high walkability) to >30-min walking distance (coded as 1, indicative of low walkability). Given that the lists of destinations varied substantially across countries due to cultural and geographical idiosyncrasies, individual destinations were collapsed into nine destination categories common to all countries and 13 destination categories common to 11 out of 12 countries (the UK had nine of these categories) . The nine categories common to all countries were: supermarket, small grocery or similar stores, post office, any school, transit stop, any restaurant, park, gym or fitness facility, and other stores and services. The 13 categories common to 11 out of 12 countries also included: library, video store, drug store/pharmacy and bookstore. Summary scores of land use mix – diversity are obtained by averaging rating across the nine or 13 categories of destinations to give a 9-destination and a 13-destination average score, respectively.

The remaining seven perceived environmental attributes assessed by all IPEN Adult study sites were rated using the original 4-point Likert scale (1 = strongly disagree; 4 = strongly agree). Summary scores for these subscales are computed by averaging the scores on the corresponding items (reverse scored, when necessary in the direction consistent with higher walkability and safety). All countries included the single-item subscales of Streets not having many cul-de-sacs and Physical barriers to walking (e.g., canyons, railways, freeways), and the 3-item Safety from crime subscale of the NEWS-A. All countries, with the exception of Belgium and the UK, used all items of the Land use mixaccess subscale of the NEWS-A (Table 2). The 2-item Street connectivity subscale of the NEWS-A was included in all studies except for Australia and Belgium, these having one common and an additional item. Nine countries included the complete Aesthetics and eight of them completed the full Infrastructure and safety for walking/cycling subscales of the NEWS-A. The full NEWS-A Traffic safety subscale was included in eight countries (Table 2). A few countries included one or two Aesthetics or Traffic safety items that, albeit not identical, were potentially suitable substitutes for the original NEWS-A items (Table 2).

Socio-demographic characteristics

For the purpose of this paper, the following self-reported socio-demographic characteristics were considered: gender, age, educational attainment, and marital status.

Data analyses

Site-specific measurement models of the NEWS/NEWS-A

Individual-level, site-specific measurement models of the NEWS/NEWS-A were derived by conducting separate CFAs for each country on the responses to factor-analyzable items (all items except for those measuring Residential density and Land use mix – diversity). Area-level clustering effects arising from the two-stage sampling procedures used in all studies were accounted for by conducting CFAs on within-area variance/covariance matrices quantifying estimates of individual-level relationships between the items [17]. CFAs were based on the Maximum Likelihood Estimation method. A priori individual-level site-specific measurement models of the NEWS-A were formulated taking into consideration the available items across countries, their comparability (see Table 2), and findings of previous CFAs of the NEWS and NEWS-A [13, 16, 17, 25]. Measurement models including the following factors and single items were estimated:

  1. 1.

    Land use mixaccess: two common (to all countries) items for the Belgian and UK models; three common items for the remaining 10 countries

  2. 2.

    Street connectivity: two common items, Australia and Belgium having a different combination of items than the remaining countries

  3. 3.

    Infrastructure and safety for walking and cycling: six common items for eight countries; five common items for Australia and Brazil; another combination of five common items for Belgium; and three common items for Hong Kong

  4. 4.

    Aesthetics: four common items for nine countries; three common items for Belgium and the UK; three common and an alternative item for Australia (here, ‘alternative’ means not included in all countries)

  5. 5.

    Traffic safety: three common items for eight countries; two common and an alternative item for Brazil; one common and two alternative items for Australia; and another combination of one common and two alternative items for Belgium and the Czech Republic

  6. 6.

    Safety from crime: three common items for all countries

  7. 7.

    Not many cul-de-sacs: a single common item for all countries

  8. 8.

    Physical barriers to walking: a single common item for all countries.

These eight factors were assumed to be inter-correlated. Jöreskog and Sörbom’s [29] iterative model-generating approach was used to re-specify the models and was guided by an inspection of standardized factor loadings, standardized residual covariances, univariate Langrage multiplier tests, Wald tests, multivariate outliers, and theoretical considerations [29]. The goodness-of-fit of the measurement models was assessed using a combination of model-fit indices recommended by Hu and Bentler [30] and Kline [31], including the Comparative Fit Index (CFI), the Root Mean Square Error of Approximation (RMSEA), and the Standardized Root Mean Squared residual (SRMS). According to Hu and Bentler [30], values supportive of good model fit are ≥0.95 for CFI, ≤0.06 for RMSEA, and ≤0.08 for SRMR. Given that the CFI is sensitive to the magnitude of correlations between variables [31] and co-occurring environmental attributes sometimes show only modest associations [17, 32], we treated CFI values ≥0.90 as indicative of acceptable levels of model fit if the other two fit indices met Hu and Bentler’s [30] stricter criteria. We also reported the χ2 test. EQS 6.2 [33] was used to conduct CFAs.

Comparability of various versions of NEWS-A subscales

The level of overlap between corresponding standard and alternative versions of the NEWS/NEWS-A factor-analyzable subscales was determined by assessing the strength of associations (Pearson correlation coefficient) and mean effect size difference (in the form of Cohen’s d) between them. This could be done using data from countries that had included items from both standard and alternative versions of the subscales. A correlation coefficient ≥0.80 (indicative of collinearity) [34] and an absolute Cohen’s d <0.25 (indicative of very small effect sizes) [35] were considered supportive of significant conceptual overlap between the versions of the subscales.

Results

Country-specific measurement models of the NEWS/NEWS-A

After deletion of up to six multivariate outliers per country, the a priori individual-level measurement models of the NEWS/NEWS-A showed acceptable fit to the data of seven out of 12 countries (Table 3). The CFI values for the remaining countries did not meet the set criterion (≥.90). An inspection of the standardized factor loadings, standardized residuals, and Wald tests revealed that in the case of the Czech Republic, Mexico, Spain, Denmark, and the United Kingdom, two items (Cars separating sidewalks and traffic; Grass/dirt separating sidewalks and traffic) did not significantly load, or loaded in the opposite direction than expected, on the latent factor they were supposed to measure (Infrastructure and safety for walking/cycling). These two items also showed lower than desirable standard loadings (<|.30|) [36] for data from Australia, Colombia, and New Zealand. It was, thus, decided to omit them from all measurement models in order to ensure cross-country comparability. Apart from excluding the two problematic items, all models were re-specified by allowing item error terms to be correlated (where appropriate) and constraining inter-factor correlations to zero where the data did not provide sufficient support for an association. All re-specified models fitted the data sufficiently well, with five measurement models also fully satisfying Hu and Bentler’s [30] stricter goodness-of-fit criteria (Table 3). All standard factor loadings were significant at a probability level of < .001 in the expected direction (Table 4), with nearly all of them exceeding an absolute value of 0.30 indicating a substantial relationship [36]. All measurement models shared the same structure with six latent factors and two single items (Table 4). The average inter-factor correlations were low. However, across the various models, the Street connectivity latent factor was moderately or highly correlated with Land use mixaccess and Infrastructure and safety for walking/cycling (Table 4).

Table 3 Goodness-of-fit indices for a priori and final re-specified individual-level, country-specific measurement models of the NEWS/NEWS-A
Table 4 Re-specified individual-level country-specific measurement models (standardized factor loadings) of the NEWS/NEWS-A

Comparability of versions of the NEWS/NEWS-A subscale

Due to differences in items and, hence, measurement models (Table 4), the NEWS/NEWS-A subscales of certain countries departed from the standard versions (Table 5). Comparability analyses showed that there was a high level of correspondence between the standard and alternative versions of the following subscales: Land use mixaccess, Infrastructure and safety for walking/cycling, and Aesthetics. Using data from countries that had data on all relevant items, strong associations and small differences in means were found between scores on these three standard and alternative subscales. The alternative version of the Traffic safety subscale used by Brazil was also highly comparable to the standard version. The level of comparability of the remaining two versions of the Traffic safety subscale (for Australia and for Belgium and Czech Republic) was marginally acceptable, with average correlations and/or Cohen’s d at the limits of the acceptable range of values. The multi-item alternative versions of the Street connectivity subscale for Australia and Belgium did not provide a good match to their standard counterpart. However, single-item versions showed higher, yet marginally acceptable, levels of correspondence (Table 5).

Table 5 Comparability of alternative versions of NEWS/NEWS-A subscales with the standard versions

Discussion

The IPEN Adult project aims to conduct pooled analyses of associations of perceived environment with physical activity and health outcomes using data from 12 countries differing in built and social environments [11]. The significant between-country differences in socio-demographic, cultural, and environmental factors required cultural adaptations to the NEWS [12] or NEWS-A [13], the perceived neighborhood environment measures adopted by the IPEN. To assist pooled analyses, it was important to establish scoring protocols that maximize the amount of usable data and yield comparable NEWS/NEWS-A summary scores across countries. These protocols are also relevant to future studies using the NEWS/NEWS-A since they facilitate cross-study comparison and, hence, can contribute to a better understanding of environment-physical activity relationships.

Several between-country differences were observed on all subscales of the NEWS/NEWS-A with the exception of Safety from crime and two single-item subscales. As detailed earlier, differences in the Residential Density and Land use mixdiversity subscales were resolved by modifying the scoring protocols of their original versions so to maximize inter-country comparability. For the remaining factor-analyzable items, well-fitting, comparable, individual-level measurement models consisting of eight distinct constructs (Table 4) were derived after omitting two problematic items (Cars separating sidewalks and traffic; Grass/dirt separating sidewalks and traffic). These findings provide further support for the robustness and generalizability of the factorial structure of NEWS and its abbreviated form, NEWS-A.

Interestingly, the two poor-fitting items mentioned above had some of the lowest standard factor loadings in the measurement models of the original NEWS and NEWS-A [13, 25] and the Australian version of the NEWS [16]. In the present analyses, one or both items did not substantially load on any latent factor of the measurement models for Australia, Colombia, Denmark, Mexico, Spain, and the UK. Additionally, for the Czech Republic and Mexico, the item “Cars separating sidewalks and traffic” showed an unexpected negative loading on the factor it was supposed to measure (Infrastructure and safety for walking/cycling). Indeed, the presence of parked cars along sidewalks may in some cases be indicative of higher volumes of vehicular traffic and, hence, reflective of traffic hazards rather than safety. This conjecture was confirmed by field observations by the two respective research teams. Given the ambiguous interpretation of these two items across IPEN countries, it was decided to exclude them from all country-specific measurement models. Thus, we do not recommend they be used in the IPEN pooled analyses, though they can still be relevant in within-country analyses.

An issue worth mentioning pertains to the strong correlations observed between Street connectivity and the factors of Land use mixaccess and Infrastructure and safety for walking/cycling, exceeding 50% of shared variance in four countries. This raises multi-collinearity concerns for multi-predictor regression models including all three subscales as explanatory variables, at least with respect to the countries where this appears to be as significant problem (Brazil, Colombia, Hong Kong, and the UK). Yet, these correlations represent associations between latent factors rather than subscale scores, the latter being used in regression analyses. Latent factor scores include only items’ communalities (the items’ variances that are accounted for by a factor), while subscale scores representing the average rating on the relevant items also include items’ uniquenesses, which are in part random error variance [36]. Thus, correlations between latent factors are usually higher than those between the corresponding ‘raw’ subscale scores. Post-hoc analyses revealed that this was also the case in this study. Specifically, the correlations between the scores on the Street connectivity and the other two potentially collinear subscales ranged from 0.23 to 0.41. Hence, the simultaneous inclusion of these three subscales as explanatory variables in regression models should not create multi-collinearity problems.

As noted earlier, six out of 12 countries did not use standard versions of the NEWS/NEWS-A subscales. An analysis of the level of comparability of the various versions of the subscales revealed some potential concerns with the Australian and Belgian versions of (single-item) Street connectivity and Traffic safety, and the Czech version of Traffic safety. Specifically, the average correlations between the standard and alternative versions of the subscales only just met the adopted comparability criterion (≥0.80) and substantial variability was observed in the individual country-specific correlations. This has some implications for the interpretation of results from pooled analyses with respect to these subscales. Namely, eventual between-country (Australia, Belgium, and the Czech Republic vs. other countries) differences in associations of these two perceived environmental attributes with outcome measures (detectable in the form of significant country by perceived attribute interaction effects) may be in part due to measurement differences. Hence, caution will be needed in interpreting such significant interaction effects (if any), especially if they are small in magnitude.

Also of concern are the observed differences in mean scores between the standard and two alternative versions of the Traffic safety subscale. Although the alternative subscales had acceptable or marginally acceptable average effect sizes, some country-specific effect sizes were too large, suggesting the possibility that differences in scores between the standard and alternative subscales for Australia, Belgium, and the Czech Republic (if data on all relevant items had been available from these countries) might be substantial. Thus, pooled analyses of between-site differences in average perceived neighborhood environmental attributes will need to take into account the possibility that the presence or lack of differences in perceived traffic safety between these three sites and the remaining sites be due to differences in measures.

Given the above, we recommend that for the purpose of conducting pooled analyses on data from the 12 IPEN countries, the factor-analyzable NEWS/NEWS-A subscales be scored according to the respective country-specific measurement models presented in Table 4, with the exception of the Australian and Belgian versions of the Street connectivity subscale which, for these two countries, should consist of a single item (see algorithms presented in Table 6). The Residential density and Land use mixdiversity subscales should be scored according to the algorithms shown in Table 6, which are based on previously presented analyses and remarks (see Table 2 and Methods section). We also recommend that future studies employing country-specific versions of the NEWS/NEWS-A use the here-proposed protocols.

Table 6 Country-specific scoring of NEWS/NEWS-A subscales for pooled analyses

Limitations

The main limitations of this study pertain to differences in the participant recruitment procedures, survey administration mode, and the use of somewhat different versions of the NEWS/NEWS-A across countries. However, differences in the ways participants were recruited are unlikely to have had a systematic impact on the inter-item associations and, thus, measurement models of the NEWS/NEWS-A. Additionally, it is important to note that nearly all countries used the same stratified two-stage sampling procedure to recruit participants from neighborhoods selected on the bases of their socio-economic and walkability levels. This likely contributed to the recruitment of relatively comparable samples across countries in terms of educational attainment and exposure to low- vs. high-walkable environments, two characteristics that may impact the accuracy and variability of responses to measures of perceived neighborhood environment [15, 37, 38]. Self-administered surveys may have resulted in less socially desirable and more consistent data [39]. Yet, differences in survey findings across modes of administrations are generally small and more pronounced when examining sensitive topics [39]. However, the items included in the NEWS/NEWS-A (e.g., access to services and traffic safety) are unlikely to be perceived as delicate issues.

The fact that the 12 IPEN countries did not use exactly the same group of survey items precluded the conduct of a more rigorous assessment of the cross-country equivalence of the NEWS/NEWS-A, whereby measurement model fit would be assessed by progressively increasing the number of constrained parameters (i.e., parameters would be constrained to be equal across IPEN countries), starting from factor loadings and finishing with the variances of errors terms [40]. With the available data, we were only able to demonstrate cross-country configural equivalence of the NEWS/NEWS-A, i.e., that all measurement models consisted of the same latent factors. This is an essential requirement for the conduct of pooled analyses based on NEWS/NEWS-A data.

Conclusions

To improve inter-country comparability , and allow pooled analyses of data, in investigations of associations of perceived neighborhood environment with physical activity and health outcomes, we have proposed modifications to the original scoring protocol of the NEWS/NEWS-A and have established country-specific, comparable measurement models to be employed in future analyses. A few potential inter-country discrepancies remain with respect to the measurement of street connectivity and traffic safety, which need to be considered in the interpretation of findings based on pooled analyses and comparison of findings from different countries. We recommend that future studies using the NEWS/NEWS-A implement the here proposed scoring protocol to facilitate cross-study comparability and interpretation of the findings. Importantly, the analytical approach presented in this paper could also be used by other multi-site projects with variations in the measurement protocol and requiring optimization of data inclusion and comparability for pooled analyses and cross-site comparisons.