Spatio-temporal cancer mortality studies and various cancer atlases in Spain (López-Abente et al. 2007, 2014) have revealed geographical patterns for some tumours, which display the following characteristics: (1) there are spatial distribution patterns that are similar in men and women, (2) there are patterns that persist over time, and (3) the determinants of these patterns are very difficult to ascertain. Such characteristics would be common to tumours that shared risk factors which, among other things, included the chemical composition of the soil, since this generally remains stable over time, can contain carcinogens such as heavy metals and affects both sexes indiscriminately. Cancers of the upper gastrointestinal tract (stomach and oesophagus), pancreas, brain, kidney and thyroids all display the above characteristics.

This study was undertaken as a result of this line of reasoning and the fact that the Spanish Geological and Mining Institute (Instituto Geológico y Minero de España (IGME)) had recently published the “Geochemical Atlas of Spain”, the first geochemical study of surface materials to cover the entire country (Locutura et al. 2012). Our study thus comes within the sphere of geochemical mapping, a discipline that investigates the concentration levels and variability of different chemical elements, as well as their spatial distribution in the territory’s surface materials. In addition, it also seeks to explain the geogenic or anthropogenic factors which influence this distribution. The Geochemical Atlas of Spain not only contains a comprehensive description of the geochemical composition of the soil at two depth levels (horizons of 0–20 and 20–40 cm), but also enabled high-definition maps of the distributions of the various elements and their associations to be plotted, a feature whose utility far outweighs that of mere description (Locutura et al. 2012). Indeed, the map reveals that many of these elements display a singular spatial pattern which, in some cases, visually resembles the distribution of mortality due to certain tumours.

The presence of toxic metals in soil per se, and in soil impacted by mining (Fernández-Navarro et al. 2012), industry (García-Pérez et al. 2007), agriculture and urbanisation, is a major concern for both human health and ecotoxicology (Ranville 2005). High-level exposures to arsenic and heavy metals have been found to be associated with multiple cancer types, including bladder, colon, kidney, liver, lung, skin and prostate, by numerous epidemiological studies (Naujokas et al. 2013). There is far less information, however, on the health effects of low-dose chronic exposure to many trace metals (Centeno et al. 2013); studies on the health effects of metals and metaloids in topsoil belong to this latter category. The few studies that are available were undertaken in cultivated areas treated with xenobiotics, areas with excess incidence of some cancers and areas with known environmental threats (Zhao et al. 2014; Olawoyin et al. 2012; Pearce et al. 2012) or took the form of exploratory ecological studies (Huang et al. 2013; McKinley et al. 2013).

In this context, the aim of this study was to assess the possible association between arsenic and chromium topsoil levels and mortality due to 27 different tumour locations, with the resulting risk estimates being adjusted for socio-demographic variables and proximity to the industrial sources of these pollutants, as possible confounders.

Material and methods

Soil sampling and arsenic and chromium analysis

Across the period June 2008–November 2010, a total of 21,187 residual soil samples (13,505 from the surface horizon and 7682 from the deeper horizon) were collected at a total of 13,505 sampling points (13,317 in mainland Spain and 188 on the Canary and Balearic islands). Residual soil is a soil belonging to the geological substratum and therefore not transported. Three areas were pinpointed, with different sampling densities defined according to their geological complexity and demographic and industrial density (one sampling point/10 km2, one point/20 km2 and one point/100 km2). Figure 1 (top) shows the different sampling densities adopted and the location of the sampling points.

Fig. 1
figure 1

Topsoil sampling sites in mainland Spain (upper). Triangulation of mainland Spain (lower): orange points denote soil sampling locations and green points, municipal centroids

The residual soil samples (from two horizons, upper and lower) were sieved to 2-mm fraction and then analysed by instrumental inductively coupled plasma mass spectrometry (ICP-MS) after crushing, pulverising and subsequent partial digestion (extraction by aqua regia). For study purposes, we selected the partial extraction results yielded by samples from the upper soil horizon. This choice was due to the fact that, in the event of trace elements being related to possible pollution, this soil sample horizon tends to display the highest trace element content. In the case of chemical analysis with partial extraction, this determination is regarded as being of greater interest for the study of the effect of trace elements on humans, by virtue of its coming closest to the bioavailable content in the sample. Specific bioaccessibility analysis was not undertaken in this study (Barsby et al. 2012; Dean 2010).

A detailed description of the sample collection and the chemical analysis techniques used can be found in the Geochemical Atlas of Spain (Locutura et al. 2012).

Mortality data

Municipal mortality data (observed cases) were drawn from the records of the National Statistics Institute (NSI) for the study period and corresponded to deaths due to 27 types of malignant tumours (see Supplementary data, Table S1, which shows the list of tumours analysed and their codes as per the International Classification of Diseases-9th and 10th Revisions). Population data were likewise drawn from NSI records. Expected cases were calculated by taking the specific rates for Spain as a whole, broken down by age group (18 groups: 0–4,…, 80–84 years and 85 years and over), sex and 5-year period (1999–2003, 2004–2008) and multiplying these by the person-years for each town, broken down by the same strata. Person-years for each quinquennium were calculated by multiplying the respective populations by 5 (with data corresponding to 2001 and 2006 being taken as the estimator of the population at the midpoint of the study period).

Statistical analysis

Cancer mortality data are aggregated at a town area level, while the data concentrations of arsenic and chromium in the soil are measures taken at sampling locations across the country (see the Fig. 1, upper). In order to obtain a representative value of this concentration at the area level, an interpolation method (ordinary kriging) was used (Ribeiro and Diggle 2001; Diggle and Ribeiro 2006). The association between metal concentration in the soil and relative risk of cancer mortality was assessed in an ecological regression, where the response was the number of observed deaths from cancer, with expected cases as offset, and the exposure covariate was the kriging estimate of the metal concentration in the municipal-centroid area. This approach (approach A) has the advantage of computational simplicity but ignores the kriging error (Szpiro et al. 2011). Since some areas may contain very few sampling points and metal concentrations may show wide variations, the kriging error can vary substantially from one area to another.

To take this into account, we therefore also adopted a second approach (approach B), whereby spatial variations in metal concentrations (topsoil sampling locations) and in relative risks of cancer mortality (town locations) were jointly modelled and estimated (spatially misaligned data).

Let expos i denote the logarithm of the metal concentration in soil at each centroid area location s i and assume for the moment that these concentrations are known. In both approaches, we assume that the observed number of cases O i in the ith area is Poisson distributed, with mean E i λ i , where E i is the expected number of cases in that area and the relative risk λ i follows a log-linear model, such that

$$ \log \left({\lambda}_i\right)=\alpha +\beta expo{s}_i+{u}_i+{v}_i, $$

where α is an intercept, β is the coefficient for the exposure covariate expos i , v i is the unstructured normal residual, and u i is the spatially structured effect which follows an intrinsic conditional autoregressive model, namely, the Besag, York and Mollié model (BYM) (Besag et al. 1991). Inference for the primary parameter of interest β is made in a Bayesian framework, and prior distributions are specified for all parameters.

In point of fact, the exposure covariate expos i is not directly observed. Instead, we observe the metal concentration c j in soil at sampling locations s j . For these observations, we assume the log-linear model

$$ \log \left({c}_j\right)= Normal\left({x}_j,{\sigma}_x^2\right), $$

where x j is the realisation of a Matérn Gaussian field at location s j and σ 2 x is a measurement error variance.

In approach A, the value of x i at each centroid area location s i is first predicted by ordinary kriging. Then, in a second step, the value of expos i in the regression (n.1) is replaced by this prediction, and the unknown parameter β is estimated. This approach can then be seen as a simple plug-in approach for the unobserved exposure variable expos i . On the other hand, in our second approach (approach B), expos i is a latent variable equal to x i and its relationship with the relative risk of mortality is assessed through joint estimation of models (n.1) and (n.2). Hence, the latter approach leads to more conservative confidence intervals, as it takes into account the uncertainty in the exposure variable. Moreover, in approach B, the Gaussian field in model (n.2) was approximated using the stochastic partial differential equation (SPDE) (Lindgren et al. 2011; Lindgren and Rue 2015), as implemented in integrated nested Laplace approximation (R-INLA) (Rue et al. 2009; Rue and Martino 2010). This approach is based on a triangulated mesh of mainland Spain (see Fig. 1, bottom). The choice of the mesh resolution (number of vertices) is a compromise between the accuracy of this approach and computational costs. To solve this trade-off, we used an information criterion based on the greatest length of the triangle edge allowed. For both arsenic and chromium data, the selected value of this length was 5 km. The extension of the mesh with a lower resolution around the Spanish mainland was constructed to control for boundary effects.

In addition to model (n.1), another ecological regression (n.3) was considered to account for potential socio-demographic and environmental confounding factors:

$$ \log \left({\lambda}_i\right)=\alpha +\beta expo{s}_i+{\displaystyle \sum_j{\delta}_j} So{c}_{ij}+\gamma Indu{s}_i+{u}_i+{v}_i, $$

where the socio-demographic indicators (Soc ij ) were obtained from the 1991 census and considered for their availability at the city level and potential explanatory ability vis-à-vis certain geographic mortality patterns (López-Abente et al. 2006). These indicators were as follows: population size (categorised into three levels: 0–2000 [rural zone]; 2000–10,000 [semi-urban zone]; and greater than 10,000 inhabitants [urban zone]); percentages of illiteracy, farmers and unemployment; average number of persons per household; and mean income. The covariate Indus i indicates the presence (within 5 km) of industries with arsenic or chromium emissions (E-PRTR database, Spanish Ministry of Agriculture and Food and Environment 2007).

In the results shown below, the exposure covariate expos i was treated as a factor categorised into quartiles where the first quartile was the reference. To find out if there is an increase in relative risk (RR) with exposure levels (trend test), the quartile ordinal was included as continuous variable. However, the above categorisation does not apply to approach B, since this variable must be Gaussian and the log of exposure covariate was used.

Descriptive maps were plotted showing the kriging estimate categorised into quantiles of arsenic and chromium topsoil levels in the respective towns included in the study and, by way of example, maps of the distribution of the standardised mortality ratios for cancer of the oesophagus, smoothed using the BYM model. The distribution of other tumours can be found in López-Abente et al. (2014).


Across the 10 years of study, a total of 861,440 deaths occurred due to the tumours analysed. Table S1 of the supplementary material shows the distribution of these deaths by sex and cancer site.

The mean topsoil concentrations in towns in the study area are shown in Table 1. Soil levels ranged from 0.50 to 2100 mg kg−1 for chromium and from 0.10 to 2510 mg kg−1 for arsenic. The interpolation procedure reduced the range of determinations in both elements, basically influencing the extreme values. The proximity of the sources of the pollutants studied had the effect of slightly altering their distribution: the difference in means for these elements between towns with and without emissions in the environs, obtained from their distribution a posteriori (using uninformative priors) (Kruschke 2013), was 1.439 mg kg−1 (95 % credibility interval (95 % CI) 1.005–1.862) for arsenic and 1.334 mg kg−1 (95 % CI 0.808–1.853) for chromium.

Table 1 Study of arsenic and chromium topsoil levels (mg kg−1), in interpolation by towns and by strata of proximity to industrial emissions

Figure 2 shows the kriging estimate categorised into quantiles of arsenic and chromium topsoil levels in the various towns included in the study. The highest number of towns in the upper quantiles of chromium in soil was mainly observed in the north-west and south-east and at other points in northern and eastern areas of the territory. In the case of arsenic, it was again the northern and eastern areas that registered the highest number of towns in the upper quantiles.

Fig. 2
figure 2

Municipal distribution of chromium and arsenic topsoil concentrations in mainland Spain. Chromium concentrations (mg kg−1) (upper); arsenic concentrations (mg kg−1) (lower)

By way of example, Fig. 3 shows a map depicting the distribution of smoothed relative risk for oesophageal cancer mortality obtained using the BYM model. This map covers the period 1999–2008, and though it shows a pattern displaying similarities between the sexes, attention should nonetheless be drawn to the differences to be seen across a wide area of western Andalusia and in the north of the peninsula.

Fig. 3
figure 3

BYM modelling of oesophageal cancer mortality in men (left) and women (right) over a 10-year period. The maps depict the posterior mean of relative risk for every town. Spain 1999–2008

Tables 2 and 3 show the statistically significant results (marked in italics) of the analyses of association between chromium and arsenic topsoil levels in Spanish towns and mortality due to the selected causes of cancer, for both men and women. Also shown are the RRs and their credibility intervals (95 % CIs) for men and women yielded by the models, i.e., the BYM model with the element as the only explanatory variable (model n.1), and this same model adjusted for socio-demographic variables and proximity to industries that release the pollutant in question into the environment (model n.3). The results of the dose-response analysis (trend test) and those of the models obtained with approach B are given too. The results are reported in their entirety in the supplementary material (Tables S2 and S3).

Table 2 Summary of the estimates of the effect (RR) of chromium topsoil levels, categorised in quartiles, on mortality due to different tumour types, by sex
Table 3 Summary of estimates of the effect (RR) of arsenic topsoil levels, categorised in quartiles, on mortality due to different tumour types, by sex

In the case of chromium, no association whatsoever was found in men. In women, however, irrespective of the proximity of chromium-releasing industries, chromium topsoil levels in the upper as opposed to the lower quartile were associated with mortality due to cancer of the buccal cavity and pharynx (RR 1.149, 95 % CI 1.036–1.274), cancer of the oesophagus (1.328, 1.146–1.544), non-Hodgkin’s lymphoma (NHL) (1.092, 1.018–1.170) and breast cancer (1.045, 1.009–1.082). The trend in RR by quartile of chromium concentration was significant for all four tumour sites (trend test).

For arsenic, the towns included in the upper versus the lower quartile of arsenic concentrations (reference) displayed excess mortality due to cancers of the brain, stomach, pancreas and lung and NHL among men and women alike. This association was predominantly observed in the model adjusted for socio-demographic variables and industrial emissions of arsenic, showing a statistically significant increase in RR with the quartiles (trend test). The tumours that showed a statistical association in men only were those of buccal cavity and pharynx, oesophagus and colorectal and kidney cancer. Prostate cancer was also associated with arsenic in soil.


Studies of the geographical distribution of cancer mortality in Spain have revealed the existence of different spatial patterns for different cancer sites that is difficult to explain. The aetiology of malignant tumours is of great complexity, owing to the presence of many determinants of a different nature (environmental, including habits, diet, environs and occupation, biological and genetic), some of which (environmental) are and some of which (biological and genetic) are not linked to the territory.

The results of this study suggest that low bioavailable arsenic levels in soil might give rise to a population exposure that was statistically associated with higher mortality due to cancers of the stomach, pancreas, lung and brain and NHL, among men and women alike. While chromium topsoil levels were associated with higher female mortality due to cancers of the upper gastrointestinal tract (buccal cavity, pharynx and oesophagus), breast cancer and NHL, no such association was found in men.

Arsenic is a known carcinogen in the skin, lung, bladder, liver and kidney, with the evidence suggesting that lung cancer is the most common cause of arsenic-related mortality (IARC 2012). People can be exposed to arsenic in food and water and from inhalation, e.g. breathing sawdust or smoke from burning arsenic-treated wood or fly ash from combustion of As-rich coal (ATSDR 2007).

Current evidence indicating that exposure to arsenic is a risk factor for cancer in the general population comes from occupational studies based on cohorts of workers who inhaled air contaminated by arsenic and other products and from studies in places with populations exposed to high arsenic concentrations in drinking water over prolonged periods of time (Straif et al. 2009). These have highlighted its association with the increase in incidence of lung, bladder, skin, kidney, liver and possibly prostate cancer (Nordstrom 2002). Currently, the greatest interest in the toxicology of arsenic lies in exposure deriving from this substance’s natural presence in food, water and soil. Understanding the environmental levels that could cause public health problems is thus a critical research area (Hughes et al. 2011).

In the USA, a nationwide survey conducted in areas that were judged not to have anthropogenic sources of arsenic reported that natural background concentrations in soil ranged from less than 1 to 97 mg kg−1 (Shacklette and Boerngen 1984). According to our study data, the range was very similar, i.e. 1 to 99.4 mg kg−1 (Locutura et al. 2012). Owing to low arsenic bioavailability in soil, it is believed that, as compared to intake of naturally occurring arsenic in water and diet, soil arsenic constitutes only a small fraction of intake (Boyce et al. 2008). In the US population, the major food contributors to inorganic As exposure were the following: vegetables (24 %); fruit juices and fruits (18 %); rice (17 %); beer and wine (12 %); and flour, corn and wheat (11 %). Approximately 10 % of total As exposure from foods is in the form of toxic inorganic As (Xue et al. 2010).

Furthermore, the concentration of heavy metals in soil also determines their presence in animal tissue (López Alonso et al. 2002), and the use of biomarkers in cattle has been suggested as a way of monitoring these elements in the environment, since they avoid the problem of bioavailability posed by soil samples.

The small number of studies means that there is very little epidemiological evidence of the association between arsenic topsoil levels and frequency of cancer. However, heavy metal and arsenic topsoil concentrations serve as an indicator of long-term exposure to these elements (Tchounwou et al. 2012). A recent study on arsenic topsoil levels and cancer undertaken in a province in China reported an association with mortality due to cancers of the colon, stomach, kidney, lung and nasopharynx (Chen et al. 2015): this study included 83 towns, 1683 top soil samples and mortality across the period 2005–2010. Although the dimensions of our study were very different, in view of the fact that it covered a 10-year mortality period from 1999 to 2008, included all 7917 towns across mainland Spain and used 13,317 sampling points in estimating arsenic and chromium levels, there is a certain coincidence in terms of the tumour sites for which excess risk was found.

Numerous studies have identified associations between lung cancer and inhaled hexavalent chromium (Cr(VI)) in occupational settings. Furthermore, it is a component of the carcinogenicity of tobacco smoke. Chromium may possibly cause gastrointestinal tract cancer due to drinking Cr-laden water and eating Cr-laden vegetables (Peralta-Videa et al. 2009; Welling et al. 2015). Inhalation of Cr(VI) has occurred in a number of industries, including leather tanning, chrome plating, cement works and stainless steel welding and manufacturing.

It is noteworthy that, in addition to breast cancer, our study observed the association between chromium concentrations and cancers of the buccal cavity, pharynx and oesophagus and NHL exclusively in women. To our knowledge, the origin of this differential risk is unknown, though in the case of cancer of oesophagus, it might be linked to the different geographical mortality pattern. One possible explanation could be that exposure to food and drinking water containing chromium has greater toxicity because it can take place over the long term (e.g., lifetime) and is more likely to occur at particularly susceptible life stages (e.g., in foetuses, children and pregnant women) than in occupational exposures (Welling et al. 2015).

Heavy metal pollution in soil has received much attention because metals are hardly decomposable by soil microbes and can amplify with food chain extension, which in turn poses a potential threat to human health (Li et al. 2014). Human beings could be exposed to heavy metals from vegetable soils via the following six main pathways: (1) direct ingestion of soil particles, (2) dermal contact with soil particles, (3) diet through the food chain, (4) inhalation of soil particles from the air, (5) oral intake from groundwater and (6) dermal intake from groundwater (Abrahams 2002; Liu et al. 2013).

We are unaware of the existence of any study comparable to ours in terms of dimension and scope. Our study encompasses the whole of mainland Spain, contains an estimate of As and Cr topsoil levels for close on 8000 towns obtained from a mesh of more than 13,000 sampling points and covered a broad study period spanning mortality over 10 years. Statistical analysis was performed using hierarchical models with a spatial component (Besag et al. 1991) fitted by R-INLA (Lindgren and Rue 2015). In these models, the risk of falling into the ecological fallacy is minimised by using a very small spatial scale and making no inferences at an individual level (Clayton et al. 1993). Moreover, to account for the spatial interpolation error in the inference, a multivariate model for spatially misaligned data is used (the set of observed locations for the explanatory variable is not identical to that for the response variable) (Cameletti et al. 2013). In this model, the inference is arrived at using the SPDE approach (Lindgren et al. 2011), which makes it computationally feasible and efficient. Although this model only allows to estimate the RR of the variable of exposure as a continuous variable, the estimation in many cases confirms the results of previous analyses and, being more conservative, generally going in the same direction of the association.

Data from soil geochemical studies are usually recorded in parts per million or milligram per kilogram and have been called compositional data requiring specific transformations (Aitchison 1994, 2003). We recognise the compositional/multivariate inherent soil data nature, but this aspect has not been explored in this study. This line has to be developed to a greater extent.

Insofar as limitations are concerned, it should be noted that this was an ecological mortality study with all the problems of using data grouped by town. The study assumed that As and Cr topsoil levels determined each town’s population exposure, and data on possible important confounding variables, such as smoking habit, were lacking. Even so, an effort was made to control for such confounders, by including a series of socio-demographic components as variables of adjustment and by attempting to control for the anthropogenic origin of As and Cr through data on the proximity of the sources of these elements.

Furthermore, it is important to stress that residing in a town with Cr and As levels in the upper quartile in no way implies that their spatial location would in itself give rise to any given cancer. The influence on the population of other socio-demographic and lifestyle factors and other exposures must be borne in mind when it comes to assessing the associations found.

With respect to possible intervention measures, a constant factor when reviewing publications relating to metal and metaloid soil concentration is the warning sounded by researchers as to the importance of controlling and limiting As levels in both soil and, due to its incorporation into the trophic chain, food (Micó et al. 2007; Burló et al. 2012; Muñoz et al. 2000; Peralta-Videa et al. 2009; van Geen et al. 1997; Delgado-Andrade et al. 2003; Peña-Fernández et al. 2014).

To conclude, the results show a statistical association in men and women alike between arsenic topsoil concentration and mortality due to cancers of the stomach, pancreas, lung and brain and NHL. Furthermore, an association was observed with cancers of the buccal cavity and pharynx, colorectal, renal and prostate in men. Chromium topsoil levels were associated with higher mortality in women due to cancer of the upper gastrointestinal tract, breast cancer and NHL, but no such association was found in men.

Access to the data of composition of the soil and its inclusion in epidemiological studies of health in humans is very innovative and opens an important way to try to understand the set of expositions that determine the frequency of cancer and other chronic diseases. On the other hand, the contribution of the geochemical atlas with an entire country geo-coded data is a great contribution to the environmental epidemiology and public health in general.