Background

Deprivation has been defined by Townsend as ‘a state of observable and demonstrable disadvantage relative to the local community or the wider society or nation to which an individual, family or group belong to’ [1]. It is a concept that has two distinguishable domains, material and social circumstances. The former relates to diet, health, clothing, housing, household facilities, environment and work. The latter is more difficult to measure as it relates to different forms of relationships, such as community integration, leisure, and formal participation in social institutions [1].

A deprivation index (DI) is recognized as a composite measure, where no single variable can be said to measure it but rather a number of variables contribute in some way [2]. Area-level indicators are commonly used to evaluate the geographical distribution of socioeconomic inequalities in health [36]. Area-level indicators have been used as proxies for individual-level data when individual measures are not available (i.e. data from census is only provided at aggregated levels for reasons of confidentiality) [7].

Approaches to ecological small-area analysis propose using the smallest practicable spatial scale for any study on inequalities in health [8, 9]. This diminishes the ecological bias, as the analysis is closer to the individual level [8, 10, 11]. Most studies that have focused on area-level social factors use geographical boundaries, developed for censuses, as proxies for actual communities or neighborhoods [6]. Administrative boundaries allow for routinely collected data to be available at the smallest scale. In Spain, census tracts, the smallest administrative unit, tend to have a mean population of 1,000 subjects [12].

There are three main methods for constructing Deprivation Indices (DIs) [13, 14]. First, standardized z-scores or log transformations of a priori selected variables. This was a very popular method up until the late 1980´s [15, 16]. The second method is the use of Principal Component Analysis (PCA) which has been the principal method used over the last twenty years. PCA transforms a number of possibly correlated variables into uncorrelated variables and permits the extraction of a smaller number of uncorrelated variables called principal components that collect a large percentage of the total variance contained in the original data which are then used to measure socio-economic status [1720]. The third method (the least common) uses feedback from health experts to assign weights to the selected variables [2124].

The increased use of DIs has allowed for the identification of deprived areas and their association to morbidity and mortality, living in deprived areas is an effect modifier. In fact, some studies confirm that people living in deprived areas have higher rates of ill health and mortality [4, 5, 8, 25, 26]. However, material deprivation is manifested in higher rates of mortality differently by gender [27, 28]. This might be due in part to the fact that men and women perceive their environment differently [2931], they may be exposed to different local environments [32], and their vulnerability to aspects in the local environment may vary [33].

In this study we propose improvements to the method of elaborating DIs. First, in the selection of the variables, we incorporated a wider range of both objective and subjective measures which makes for a more complete DI. Several studies that analyzed deprivation-associated excess mortality found that men had higher rates of excess mortality than women [27, 28]. A large proportion of excess deaths were found in diet-related causes of death in females and smoking and alcohol consumption in males [27]. However, inferences about gender differences in relation to deprivation and mortality are sensitive to the choice of inequality measures used [3436]. Making it important to incorporate indicators that are gender sensitive (i.e. residential/ environmental indicators as they are strongly related to women’s health and individual economic activity indicators as they are associated to men’s health) [32]. The strength of association between socio-economic and health measures tend to be greater for men than for women [35]. Thus, contextual indicators are crucial if we want to have a comparative measure between men and women’s health.

Studies have shown that neighborhood context influences self-rated health, especially in women, beyond the effects of individual factors [33, 37]. Neighborhoods with worse socioeconomic conditions have a negative effect on health [38]. Additionally residential environment may be more important for women’s health while individual economic activity is more important for men’s health [32]. Other studies have found that occupational factors are more important for men’s health whereas the home environment is more important for women’s health [39].

Second, in the statistical methodology, we used a distance indicator (Pena Distance or DP2) instead of the leading aggregating method PCA. DP2 overcomes several limitations of PCA, for instance, aggregating variables expressed in different units of measurement, arbitrary weights, and duplicate information [4045]. The greatest advantage of DP2 with respect to PCA consists on how each handles redundant information. Thus, while in PCA the first component, i.e. the DI, is composed of those variables that capture greater variability it leaves some variables out of the index. In DP2 the index is composed of all the variables, leaving out of the index only the redundant information but including each and every one of the variables. Thus, in principle, DP2 would collect more variability [45].

In addition, we propose another methodological improvement, which consists in the use of a more robust statistical method to assess the relationship between deprivation and health responses in ecological regressions.

In this sense, we focused on mortality from trachea, bronchial and lung cancer and diabetes, because there is certain evidence that gender could act as an effect modifier in the relationship with deprivation, and breast and prostate cancer, given their known association with deprivation. Several studies have shown a positive association between lung cancer mortality and deprivation for men and a negative association for women [18, 28, 46]. This is explained because in men, lower socioeconomic status implies a higher prevalence of smoking, which leads to an increased risk of lung cancer mortality in individuals with lower socioeconomic status. On the other hand, in women, those with higher socioeconomic status are those with a higher prevalence of smoking, leading to an increased risk of dying from that cause in individuals with higher socioeconomic status. Regarding diabetes, those individuals with high educational levels have a significantly lower risk of dying from this cause (odds ratio, OR = 0.66) compared to those with low educational levels [47]. In the Whitehall study [48], we observed a similar gradient for mortality from diabetes mellitus than with the other pathologies studied; higher mortality from diabetes mellitus in individuals who were in the lowest levels of the hierarchy. This excess mortality appeared to be due to the increased frequency of cardiovascular risk in this type of study group. In addition, variations in mortality have also been observed for diabetes mellitus type II (DM2) at the residence area level. Thus, individuals living in more deprived areas tend to have worse lifestyle (higher values of body mass index –BMI-, greater prevalence of smoking) which could lead to worse glycemic control and greater number of complications associated with diabetes mellitus. As in the case of lung cancer mortality, this association could be modified by sex.

However, the results from the Spanish literature, at least in those that use a spatial adjustment [18, 28, 46] do not systematically coincide with each other; particularly for prostate cancer and breast cancer. Diabetes mortality has shown a positive association with deprivation for both men [28, 49] and women [49]. Borrell et al.[18] and Puigpinós-Riera et al.[46] found a positive association for breast cancer in Alicante (relative risk, RR = 1.55 95% confidence interval, 95% CI:1.04-2.19) yet the majority of other cities showed a negative association, not statistically significant, except for Vigo (RR = 0.54 95% CI 0.33-0.84). Benach et al.[27] found a negative association for breast cancer and deprivation with Index 1, which measured deprivation through unemployment, illiteracy and low social class; however, found no association with Index 2, measured deprivation through overcrowding and a small component of unemployment and illiteracy. For prostate cancer, some studies found no clear association between mortality risk and deprivation [46]. Benach et al.[27] found a negative association for prostate cancer and deprivation with Index 1; however, found no association with Index 2.

Our hypothesis is that the differences found in the literature are caused mainly by the lack of robustness of the statistical methodology applied and the lack of robustness of the spatial adjustment applied. Therefore, our objective in this paper is to evaluate a more robust methodology, in the way of developing DIs, the selection of component variables and how to build them, and, above all, in the statistical method applied to analyze how the effect of gender modifies the relationship between deprivation and mortality.

Methods

Design and study population

Most countries in the world periodically carry out population censuses that gather information on socio-demographic characteristics of the population resident in a country. In Spain, the Spanish Census of Population and Housing (CPH) conducted in 2001 reported nearly 41 million inhabitants. The province capital of Barcelona had over 1.5 million inhabitants and an average number of 1,201.33 inhabitants per census tract (standard deviation, SD, 526.66; Median 1,085.50). For this study, we conducted an ecological small-area analysis based on the residents of the Metropolitan region of Barcelona. A total of 2,978 census tracts were examined.

Variables

Standardized mortality rates (SMR), stratified by sex, were studied for four mortality causes: tumor of the bronquial, lung and trachea (ICD-10: C33-C34), diabetes mellitus (ICD-10:E10-E14), breast cancer (ICD-10:C50), and prostate cancer (ICD-10:C61). The mortality data was provided from the Catalan Mortality Registry. We only used death certificates of residents of the Metropolitan Region of Barcelona who died between 1994 and 2007.

Deprivation index

Socioeconomic conditions were summarized using a DI in a census tract level. Sixteen socio-demographic variables available in the Spanish CPH were included. Four of these: unemployment rate, percent with low educational level, manual workers and temporary workers, have been used in several DIs for small-area studies in Spain [46, 50]. The other variables (university education, mono-parental homes, activity rate, immigrants, homes without heating, no toilet/bathroom in the home, bad communications, vandalism/crime, few green areas, exterior noise, contamination/bad smells, and dirty street) were chosen based on literature that focused on the association between contextual indicators and their effect on health, particularly those that analyzed gender differences [38]. We chose contextual indicators, most of them subjective, which were representative of the neighborhood characteristics (Table 1).

Table 1 Mean and Percentile distribution of socioeconomic indicators over census tracts of the Barcelona Metropolitan Region, 2001

The DI was constructed by aggregating the above-mentioned variables using another indicator, DP2, instead of the standard PCA method. DP2 is an iterative procedure that weights partial indicators depending on their correlation with the global index [41, 45, 51]. It is able to capture all the non-redundant variance of the indicators (i.e. avoiding multicollinearity); thus, allowing for the inclusion of a greater number of variables [41, 42]. Given that DP2 uses all the valuable information contained in the partial indicators the more complete the final DI will be, since each variable contains unique information not present in others [44]. Magnifying the goodness of DP2 versus the extraction of an index through principal component analysis, especially when introducing a greater number of variables would not suffice for identifying a single component for the index. More than one component should be collected when applying PCA thus resulting in an arbitrary choice of weights for aggregating the components to obtain the index. Pena [41] and Zarzosa [42] showed that DP2 fulfills all the properties of a good composite indicator; monotony, unicity, invariance, homogeneity, transitivity, exhaustivity, existence and determination, and additivity.

The process for aggregating the variables using DP2 is a four-step process previously described in detail [51].

Data analysis

When spatial data is available, the variability in the observed response is usually greater than the expected, resulting in over dispersion. In fact, it is important to distinguish between two sources of extra-variability [52]. The first most important source is the so-called 'spatial dependence’; it is a consequence of the correlation between the spatial unit and the neighboring spatial units, generally contiguous geographic areas. Nearby areas are more similar than those far apart. Part of this dependence is not really a structural dependence, but mainly due to variables not included in the analysis. The second source is the independent extra variability, spatially uncorrelated, called heterogeneity (not spatial); it is the result of unobserved variables, without spatial structure, which could influence the response [52, 53].

To account for this extra variability, it is necessary to introduce some structure in the model. Otherwise, unless the model is linear, the estimates of the parameters of interest and the standard errors of the estimators will be biased; therefore any inference based on them, will be wrong [54].

In this paper we have chosen to use hierarchical Bayesian models, more specifically a model based on Besag, York and Mollié [55, 56] (BYM) to analyze the relationship between mortality and small-area deprivation.

As we know, the idea is to introduce two random effects in a generalized linear model with Poisson response, in order to capture the extra-variability [53].

O i P o i s s o n μ i P o p i L o g μ i = α + L o g P o p i + k = 1 4 β k I n d e x Q ki + υ i + S i + β 5 P o p 4564 i + β 6 P o p 65 m i
(1)

Where Oi denotes the observed cases of the response variable in the census tract i, μ i is the relative risk in section i, Popi is the population (men or women) of the census tract, υ i is the random effect which reflects the heterogeneity; Si is the random effect that reflects the spatial dependence, α is the intercept, interpreted as the logarithm of the baseline risk; IndexQki is the quintile of the deprivation index (standardized) in the census tract i (the first quintile was taken as a reference); the β are the parameters of the model, which can be interpreted as the logarithms of the relative risks associated with the explanatory variables; Pop4564i is the percentage of men (or women) from 45 to 64 years in the census tract i, and Pob65mi is the percentage of men (or women) aged 65 years or older in the census tract i.

The DI was categorized into quintiles because the influence of deprivation on the (spatial) variation of mortality could be non-linear.

The non-spatial random effect, also called heterogeneity, assumed to be normally distributed, with a mean of zero and constant variance. For the spatial random effect a conditional autoregressive model (CAR) is used [57, 58]. This approach, which is the most utilized and has the lowest computational cost, approximates the spatial dependence as an average spatial effect of neighboring areas [53]. The areas considered are the census tracts of each municipality in the Metropolitan Region of Barcelona and surrounding areas are defined as adjacent census tracts, in other words they share a border.

Note that the specified model does not use, as an offset the number of cases expected in the census tract, but the population (men or women) of the same. This is because, unlike the standard BYM model [53], here we use the crude death rate (from the census tract) and not the standardized mortality ratio as an indicator of mortality. The reason is to avoid the problem called 'mutual standardization' [59, 60]. Rosenbaun and Rubin [59] show how the use of standardized rates as the response variable in ecological regression models leads to biased results if only the answer, not predictors, are adjusted for the same confounder, usually the age distribution. When the predictor is not adjusted, it is implicitly assumed that the effect of predictor is constant for all strata of the confounding variable. This may be true for the effect of an air pollutant, in principle it is the same for all ages, but not for the deprivation index. Grisotto et al.[60], in line with Rosenbaun and Rubin [59] show that unbiased estimators can be obtained by adjusting the response and the predictors (the index of deprivation in this case) with the same variable (age distribution) or, even easier to use are crude rates as the response variable and entering age (as an average or structured) as an explanatory variable of the model. This is why we have introduced the age structure of the census tract (proportion of men and women aged 45 to 64 years and 65 years or more). The introduction of age also lets you control the age effect in the model. This modification allows to overcome methodological limitations of the standard implementation of the BYM model [18]. Furthermoe, and also getting no significance for the coefficients of the variables that represent the population from 45 to 64 years and older population or more at 65, none of the cases studied, we can say that the risks found not differ by age group. Moreover, because the estimates associated with variables representing the population aged 45 to 64 years and older population less than 65, were not significant in any of the cases studied, we can say that risks not found differ by age group.

A second difference from the standard BYM model, not so obvious, is the use of standardized explanatory variables. The reason is that the spatial random effect, approximated by CAR, can be correlated with some (or all) explanatory variables that have a similar spatial dependence (known as concurvity). If this were the case, there would be an over-adjustment, unlike the phenomenon of multicollinearity in linear models, which could bias the estimates [61]. Simulations performed by us suggest that the problem could be solved by completely standardizing the potentially problematic explanatory variable. On the other hand, the introduction of the deprivation index (standardized) into quintiles, in addition to collecting a possible nonlinear effect, could mitigate much of the problem.

Spatial models were built as Bayesian hierarchical models with two stages [62] and estimated using the integrated nested Laplace approximation [6264] (see Appendix: Annex).

Models were compared using the DIC (Deviance Information Criterion) [65] and the conditional predictive ordinate (CPO) for each observation (in fact –mean(log(cpo)) [66, 67]. CPO is a cross-validated predictive approach i.e., predictive distributions conditioned on the observed data with a single data point deleted. Asymptotically the CPO statistic has a similar dimensional penalty as DIC. In this perspective, the CPO statistic may be similar to DIC. In both cases, the lower the DIC or the CPO, the better the model.

There was not experimental research in this work. All computations were carried out using the interface INLA [68], running directly in R (version R 2.11.0) [69].

Results and discussion

According to the Spanish CPH of 2001 the average number of inhabitants per census tract in the metropolitan region of Barcelona was 1201.33 (standard deviation 526.55; median 1085.50).

Table 1 describes de distribution of the socioeconomic indicators, used to construct the DI, by gender in the Metropolitan Region of Barcelona in 2001. We observed a high percent of temporary workers (20.9% women, 18.0% men) and manual workers (51.6% women, 56.0% men) for both men and women. Women had a higher rate of insufficient education (17.1% vs. 12%) and a lower activity rate (57.1% vs. 67.3%) than men. Over a third of the individuals claimed their neighborhoods had dirty streets, exterior noise, few green areas and vandalism/crime.

Tables 2 and 3 show the associations between cause-specific mortality and deprivation by quintiles of the index, for men and women controlled for age. In the case of women (Table 2), we observed a positive association for diabetes mortality. However, both lung cancer mortality (RR in Q5 = 0.78; 95%CI: 0.65-0.93) and breast cancer mortality (RR in Q5 = 0.81; 95%CI: 0.71-0.92) had an inverse relationship with socioeconomic deprivation; women with greater deprivation had less mortality risk (statistical significance was present only in the fifth quintile).

Table 2 Association between three mortality causes and the socioeconomic deprivation index in women
Table 3 Association between three mortality causes and the socioeconomic deprivation index in men

For men (Table 3), we observed a positive association for lung cancer mortality; where, at greater socioeconomic deprivation there was greater mortality risk (RR in Q5 = 1.31; 95%CI: 1.19-1.43). For diabetes mortality we also observed a positive association with mortality risk and deprivation; however, statistical significance was found only in second and third quintiles. On the other hand, for prostate cancer mortality there is no systematic relationship between deprivation and mortality risk. Only the second quintile is significant with a RR = 1.13 (95% CI: 1.00-1.26).

Overall, our results are consistent with those provided by the literature. At greater deprivation, there is an increased risk of dying from diabetes for both sexes (although in the case of men the relative risks associated with the third quintile of deprivation onwards were not statistically significant) and of dying from lung cancer for men. On the other hand, at greater deprivation, there is a decreased risk of dying from breast cancer and lung cancer for women (although in both cases only the relative risk associated with the top quintile of deprivation was statistically significant). We did not find a clear relationship in the case of prostate cancer (presenting an increased risk but only in the second quintile of deprivation).

This study has been able to improve the statistical methodology applied in building deprivation indices as well as the robustness of spatial adjustments in ecological studies. Our results were consistent with the existing literature that analyzes the association between deprivation and mortality (lung cancer, diabetes mellitus, breast cancer and prostate cancer); however, our focus was in urban small-areas. As some literature suggests, mortality differs in rural versus urban settings, particularly in women [70]; thus, it would be important to explore geographic differences in more detail in future studies. In addition, future studies should also focus on researching these same causes of death in other European cities and additional causes of death in the Barcelona Metropolitan Region.

Conclusions

We believe our results were obtained using a more robust methodology. First off, we have built a better index that allows us to directly collect the variability of contextual variables without having to use arbitrary weights, as is assumed by the aggregation of components through PCA extraction. Building an index using PCA is adequate, the results do not differ from those obtained by DP2, when you add only a few variables because the first component allows you to collect enough variability. When using a large number of variables, like shown in this article, it is clear that a single component is not enough to collect the variability contained in the variables and the index should thus be constructed by aggregating two or more components. The arbitrariness in the choice of aggregation weights is what we avoided by using the DP2 methodology as the algorithm used for constructing the index naturally allows you to collect all the variability contained in the variables excluding only the shared part and not the variable itself.

Secondly, we have solved two major problems that are present in spatial ecological regressions, i.e. those that use spatial data and, consequently, perform a spatial adjustment in order to obtain consistent estimators. In particular, the problem of mutual standardization and the problem of concurvity. As stated previously, the problem of concurvity is present when the spatial random effect is correlated with some (or all) of the explanatory variables that have similar spatial dependence. Using crude rates as indicators of mortality for the response variable and the introduction of age as an explanatory variable in the model allows us to ensure the collection of unbiased estimators. Furthermore, the introduction of age in the model allows us to monitor its effect on the results. On the other hand, the introduction of the index (standardized) into quintiles, in addition to collecting a possible nonlinear effect, could mitigate much of the problem of concurvity.

One limitation of this study is the limited variable selection due to lack of statistical information in censuses. Additionally, the deprivation index was computed using information from the 2001 Census of Population and Housing (the only information available) while mortality data was recorded using information from the period 1994–2007. Some characteristics recorded in 2001 may not capture the real environment of the areas in the pass, when the deaths occurred. However, the data collection period for the construction of the index corresponds to the half point of the period in which the mortality data was collected and the relative position of the census track with respect to the deprivation index (i.e. the quintile of the index where it was located) remains very stable in time. It is also important to point out that given that the data is over a decade old the present patterns of associations may have changed, primarily due to the increased amount of immigration since 2001 and the activity rate downfall due to the economic crisis.

Appendix

Annex

Spatial models were built as Bayesian hierarchical models with two stages [62, 63]. The first stage was the observational model p y | x , where y denoted the vector of observations and x are the unknown parameters following a Gaussian Markov random field (GMRF) denoted as p x | θ . The second stage was given by the hyperparameters θ and their respective prior distribution p θ . The desired posterior marginals

p x i | y = θ p x i | θ , y p θ | y d θ
(2)

of the GMRF were approximated using the finite sum

p ˜ x i | y = k p ˜ x i | θ k , y p ˜ θ k | y Δ k
(3)

where p ˜ x i | θ , y and p ˜ θ | y denoted approximations of p x i | θ , y and p θ | y , respectively. The finite sum (A1) was evaluated at support points θ k using appropriate weights Δ k .

The posterior marginal p θ | y of the hyperparameters is approximated using a Laplace approximation [71].

p ˜ θ | y p x , θ , y p ˜ G x | θ , y | x = x * θ
(4)

where the denominator p ˜ G x | θ , y denoted the Gaussian approximation of p x | θ , y and x * θ was the mode of the full conditional p x | θ , y [72].

According to Rue et al.[63], it is sufficient to “numerically explore" this approximate posterior density using suitable support points θ k in (A1). In this paper, these points were defined in the h-dimensional space, using the strategy called central composite design. Here, centre points were augmented with a group of star points, which allowed for estimating the curvature of p ˜ θ | y .

Here, to approximate the first component of (A1) a simplified Laplace approximation (less expensive from a computational point of view with only a slight loss of accuracy) was used [6264].