Introduction

Heavy metal accumulation in soils caused by industrialization has attracted wide attention. Soil contamination by metals has become a serious widespread problem in many parts of the world especially in developing countries. Heavy metals in soil may come from agricultural activities (Huang and Jin 2008), urbanization, industrialization, and mining activities (Zhong et al. 2012). Among these intensive agricultural and industrial activities are considered as two of the most damaging anthropogenic activities in the world. According to numerous studies, heavy metals pollution in the environment are mainly derived from anthropogenic sources (Acosta et al. 2011; Gu et al. 2012; Bini et al. 2011; Li and Feng 2012b; Chabukdhara and Nema 2012; Cai et al. 2012; Guo et al. 2012; Zhang et al. 2009). Pollutions (e.g., fertilizers, pesticides, heavy metals, and salts) produced by human activities (e.g., agriculture, industry, and urban) affect the quality of soil and water drastically. In fact, these activities lead to heavy metals accumulation in soils. These result in changes in landscape, contamination of soil and water, and degradation of land resources. Therefore, the protection of soil quality is very important and vital.

The spatial distribution of heavy metals in soils is of primary significance in assessing soil quality and locating pollution sources (Gu et al. 2012). However, a variety of monitoring programs have generated numerous intricate datasets that are difficult to interpret because of the interactive influences. Thus, multivariate geostatistical and geographic information system (GIS) methods have been widely employed as powerful tools to extract the majority of meaningful information from datasets without losing useful information (Gu et al. 2012; Acosta et al. 2011; Li and Feng 2012b; Lin et al. 2010; Yang et al. 2011; Chabukdhara and Nema 2012; Cai et al. 2012; Bini et al. 2011; Lu et al. 2012; Guo et al. 2012).

Multivariate analysis offers techniques for classifying relationships among measured variables. The two most common multivariate analyses are principal components analysis and cluster analysis. Notable examples of their recent use in assessment of soil pollution by heavy metals are found in the following reports (Li and Feng 2012b; Acosta et al. 2011; Gu et al. 2012; Lu et al. 2012; Cai et al. 2012; Zhong et al. 2012; Chabukdhara and Nema 2012; Yang et al. 2011).

Principal component analysis, a statistical technique, linearly transforms an original set of variables into a substantially smaller set of uncorrelated new variables that represent most of the information of the original dataset (Chabukdhara and Nema 2012). A small set of uncorrelated variables is much easier to understand and use in further analysis than a larger set of correlated variables. The purpose of cluster analysis is to identify groups or clusters of similar sites on the basis of similarities within a class and dissimilarities between different classes. Agglomerative hierarchical clustering (AHC) examines distances between samples. The most similar points are grouped forming one cluster and the process is repeated until all points belong to a cluster. The results obtained are presented in a two-dimensional plot called a dendrogram (Irpino and Verde 2006; Ward 1963; Everitt et al. 2001).

Geostatistical techniques are based on the theory of a regionalized variable (Matheron 1963), which is distributed in space (with spatial coordinates) and shows spatial autocorrelation such that samples close together in space are more alike than those that are further apart. A geostatistical technique measures the spatial variability and provides for spatial interpolation (Webster and Oliver 2007; Shi et al. 2007).

Hamedan county is one of the most developed areas in Iran. This region in terms of agriculture is one of the centers of excellence in terms of Iran’s economic growth. In recent decades, the development of industry and overuse of fertilizers in agricultural activities have consequently increased the risk of pollution by metallic elements. There were no previous studies on heavy metals pollution in this area, and thus in 2010, the Hamedan Department of Environment (Hamedan-DoE) began a collaborative research program to identify the presence of Cr, Co, Zn, V, Cu, Pb, Ni, Fe, As, and Cd in soils. The results of this project are presented in this article.

The study involved factor analysis of 130 sampling sites in Iran’s Hamedan county to characterize the patterns of ten heavy metals in the soil (Cr, Co, Zn, V, Cu, Pb, Ni, Fe, As, and Cd). A new integration statistical method, enrichment factors to assess metal enrichments in soil under different geological structures and land uses, were determined to interpret the background concentration of metals, examine spatial dependency and variation mechanisms of the metals, and evaluate spatial distribution patterns and delineation of polluted areas. Correlations between the factor patterns and the physical soil parameters were analyzed to elucidate the characteristics of the heavy metal pollution of the soils at these 130 sampling sites. Agglomerative hierarchical clustering and gap statistics were also applied to group the metals and physical parameters, and thereby delineate the interrelationships between metals and anthropogenic activities.

Materials and methods

Study area

The study area includes four adjacent townships (Kaboudarahang, Razan, Hamedan, and Bahar) located in Hamedan county in the west of Iran comprising a total area of 118,577.2 ha, including 63,316.55 ha (53.4 %) of agricultural land (latitudes 33°59′ and 35°48′ and longitudes 47°34′ and 49°36′; Fig. 1). The predominant crops in these areas are wheat, barley, alfalfa, potato, garden crops, and orchards. Hamedan province, in terms of agriculture, is one of the centers of excellence in terms of Iran’s economic growth. This is a semi-arid region with a mean annual rainfall of 400 mm which varies within the year.

Fig. 1
figure 1

Study area and sampling sites

Spatial data

The geology map with a scale of 1:100,000 was acquired from the National Cartographic Center sites in the network formation. Preparation of land use maps began with field studies in March 2010 using multitemporal satellite IRS-P6 AWiFS data (6 March, 3 April, 27 April, and 18 July) and digital topographic maps (of scale 1:50,000).

Image processing and ground step’s data were integrated with satellite data. The first image correction was carried out and the images were used as the georeference. The best false color images of Optimal Index Factor (OIF) were used to identify the best bands with the minimum correlation and maximum variance, and the function is expressed as:

$$ \mathrm{OIF}=\frac{{\displaystyle \sum_{i=1}^3{\mathrm{SD}}_i}}{{\displaystyle \sum_{j=1}^3\left|\left.{\mathrm{CC}}_j\right|\right.}} $$
(1)

where \( {\displaystyle \sum_{i=1}^3{\mathrm{SD}}_i} \) is standard deviation of three bands, and \( {\displaystyle \sum_{j=1}^3\left|\left.{\mathrm{CC}}_j\right|\right.} \) is a sum of absolute correlation between two of three bands.

Hybrid image classification was carried out for the preparation of land use maps using ERDAS EMAGINE 9.1 software. In order to assess thematic accuracy, a stratified random sampling design was used to select a total of 360 pixels for the land use map. The accuracy of the land use map was checked using GPS points obtained during the March and July 2010 fieldwork as reference. Classified and reference information were cross-tabulated in an error matrix. Errors of commission (user’s accuracy), omission (producer’s accuracy), and the overall accuracies were calculated. The map location of industries and mines was prepared using GPS data obtained during the fieldwork.

Soil sampling

In accordance with soil types and the uniformity of sampling distribution in the study area, in April 2010, the area was divided into 5 × 5 km grids using random systematic method and 130 composite soil samples were collected (0–20 cm depth). The distribution of sampling points is presented in Fig. 1. The coordinates of the sample points were detected using GPS and were plotted on a topographic map with a scale of 1/50,000. The samples were air dried. Gravel, coarse organic matter, and plant root residues were removed. Soil samples were dried in an oven at 105 °C for 24 h. The dried samples were passed through a 2-mm plastic sieve and stored in 1-kg plastic bags prior to chemical analysis.

Chemical analysis

Selected soil properties relevant to mobility and bioavailability of heavy metals were analyzed for the general characterization of soils. Particle size distribution was measured by the hydrometric method to determine the sand, silt, and clay percentages. Soil pH (soil, H2O ratio = 1:2) was measured using a pH meter with a glass electrode (Shi et al. 2007). Electrical conductivity was determined on the saturation paste extract (Micó et al. 2006). Organic matter concentration was determined by the Walkley–Black method. A 0.5 g of milled soil sample was digested with 8 ml of HNO3 (65 %), 5 ml of HCl (37 %), and 1.5 ml of HF (40 %) in a Milestone Microwave (Milestone Ethos 900 plus Mod. 44062) in accordance with the ISO 11466 procedure (International Organization for Standardization, 1995). Metal contents were analyzed by inductively coupled plasma atomic emission spectrometry for Cr, Co, Zn, V, Cu, Pb, Ni, Fe, and Cd, and by hydride generation atomic fluorescence spectrometry for As (Fu et al. 2008).

Quality assurance and quality control

The reagent blanks were monitored throughout the analysis and were used as required to correct the analytical results. All analyses were performed at the Iran Mineral Processing Research Center. All samples were analyzed in triplicate, with the analytical process monitored using certified reference standards.

Background concentration of metals

Background of the status of heavy metals was prepared by using soil samples from virgin regions, e.g., pristine rangelands and regions, which were far from human activities. The mean of each metal was calculated and these virgin points were overlayed on to geological maps to produce the background concentration of each metal.

Statistical analysis using computer software

Multivariate analysis

Prior to statistical analysis, the datasets were evaluated using Kolmogorov–Smirnov method; and when the distribution was not normal, the data were log-transformed (Shi et al. 2007) before statistical treatment. The outliers can result in discrete data that breach the geostatistical theories. Box plots were then used to assess and correct the outliers in the study. Data higher than \( \overline{x} \) ± 3 SD were replaced with maximum or minimum of row datasets which were lower than \( \overline{x} \) ± 3 SD. Pekey (2006) and Inácio et. al. (2008) had also used this method to assess and correct the outliers in their studies.

All statistical analyses were thus made based on data in normal distribution. Descriptive statistics (mean, standard deviation, range, and coefficient of variation) of metals and soil properties were derived applying the SPSS®16.0 for Windows. The t test was used to determine the existence of significant differences among soil properties and metals. Multivariate analysis was performed using SPSS® software, with XlStat an add-in package of Microsoft Excel 2010. AHC and gap statistic (Gs) were used to identify differences between the classes and clustering the samples with similar metal content and soil properties. AHC was undertaken according to the Ward algorithmic method. Results are presented in a dendrogram where steps in the hierarchical clustering solution and values of distances between clusters (squared Euclidean distance) are represented (Chabukdhara and Nema 2012). Gap statistic is an important tool for estimating the number of clusters (groups) in a set of data (Tibshirani et al. 2001). The function of gap statistic is expressed as (Dudoit and Fridlyand 2002):

$$ {W}_k=\frac{1}{2}{\displaystyle \sum_{k=1\kern0.5em }^2{\displaystyle \sum_{i\in {C}_k}{\displaystyle \sum_{j\in {C}_j}d\left(i,j\right)}}} $$
(2)

Varying the total number of clusters from k = 1…K, giving within cluster point scatters W k . Generate B reference datasets from a uniform distribution over the range of the observed data, and cluster each one giving within cluster point scatters W *kb, b = 1…B, k = 1…K. The gap statistic is computed as follows:

$$ \mathrm{Gap}(k)=\frac{1}{B}{\displaystyle \sum_{b=1}^B \log \left({W}_{kb}^{*}\right)- \log \left({W}_k\right)} $$
(3)

The standard deviation of the B replicates log (W k *) is computed as:

$$ {\mathrm{sd}}_k={\left[\frac{1}{B}{\displaystyle \sum_{b=1}^B{\left( \log \left({W}_{kb}^{*}\right)-\overline{i}\right)}^2}\right]}^{1/2} $$
(4)
$$ \mathrm{here}\kern0.5em \overline{i}=\frac{1}{B}{\displaystyle {\sum}_{b=1}^B \log \left({W}_{\mathrm{kb}}^{*}\right)} $$
(5)

The number of clusters was chosen via,

\( \widehat{k} \) = the smallest k such that gap (k) G(k + 1) − sd k + 1 (Tibshirani et al. 2001).

Principal component analysis (PCA) with Varimax normalized rotation was used for interpreting relationships and the hypothetical source of metals (lithogenic or anthropogenic). Varimax rotation was applied because the orthogonal rotation minimizes the number of variables with a high loading on each component and facilitates the interpretation of results (Acosta et al. 2011; Bini et al. 2011; Zhong et al. 2012; Li and Feng 2012a; Cai et al. 2012; Guo et al. 2012; Yang et al. 2011).

Geostatistical methods

Geostatistics was used to describe the geospatial distribution of the ten heavy metals at the regional scale. The geospatial data were compiled, merged, loaded, and spatial interpolation was performed with geostatistic extension. Geostatistic extension uses the technique of semivariogram (or variogram) to measure the spatial variability of a regionalized variable and provides the input parameters for spatial interpolation of kriging (Webster and Oliver 2007). Kriging is an interpolation method applied widely to elucidate spatial distribution of many parameters including metallic elements. Kriging interpolation refers to a group of spatial interpolation methods for assigning a value of a random field to an unsampled location based on the measured values in random fields at nearby locations (Xie et al. 2011). The semivariogram model was fitted before a kriging operation can be performed. The process of semivariogram model fitting was conducted using ILWIS 3.6® and the spatial distribution was achieved using disjunctive kriging. Semivariogram calculations were conducted and experimental semivariograms of soil heavy metal concentrations could be fitted with the available spherical, exponential, Gaussian, wave, rational quadratic, circular, and power models (Hendrikse 2000). Semivariograms were used in the study to analyze discrete soil samples. The function is expressed as:

$$ \widehat{\gamma}=\sum {\left({Z}_i\hbox{--} {Z}_{i+h}\right)}^2/2n $$
(6)

where: \( \widehat{\gamma} \) is the semivariogram value of points that have a certain distance (h) towards each other; Z i is the value of point i, Z i + h is the value of a point at distance h from point i, ∑(Z i  − Z i + h)2 the sum of the squared differences between point values of all point pairs within a certain distance class, and n is the number of point pairs within a distance class.

The function of semivariogram models are expressed as follows;

Spherical model:

for 0 < h < = a

$$ \gamma (h)={C}_0+C\times \left[\frac{3h}{2a}-\frac{h^3}{2{a}^3}\right] $$
(7)

for h > a

$$ \gamma (h)={C}_0+C $$
(8)

Exponential model:

$$ \gamma (h)={C}_0+C\times \left[1-{e}^{-\frac{h}{a}}\right] $$
(9)

Gaussian model:

$$ \gamma (h)={C}_0+C\times \left[1-{e}^{-{\left(\frac{h}{a}\right)}^2}\right] $$
(10)

Wave model:

$$ \gamma (h)={C}_0+C\times \left[1-\frac{{}^{\mathrm{Sin}\left(\frac{h}{a}\right)}}{\frac{h}{a}}\right] $$
(11)

Rational quadratic model:

$$ \gamma (h)={C}_0+C\times \left[\frac{\frac{h^2}{a^2}}{1+\frac{h^2}{a^2}}\right] $$
(12)

Circular model:

for 0 < h < = a

$$ \gamma (h)={C}_0+C\times \left\{1-\frac{2}{\pi}\times \arccos \left[\frac{h}{a}\right]+\frac{2h}{\pi a}\sqrt{1-\frac{h^2}{a^2}}\right\} $$
(13)

for h > a

$$ \gamma (h)={C}_0+C $$
(14)

Power model:

$$ \gamma (h)={C}_0+k\times {h}^m $$
(15)

where, h is the distance, C 0 is the nugget variance, C 0 + C is the sill, a is the range, k linear slope for the power function, m power exponent (0 < m < 2).

In developing a spatial distribution map for the pollution areas the spherical, exponential, Gaussian, wave, rational quadratic, circular, and power semivariogram models were used in order to determine the best model for fitting the probability of the true values of soil heavy metals at unsampled points. The best output map based on R 2 for each metal was selected for delineation of heavy metal pollution by geoenrichment index.

Interpretation of heavy metal pollution by geo-enrichment factor

Geoenrichment factor (EFG) is a geochemical index based on the assumption that under natural soil conditions, there is a linear relationship between a reference element and other elements. The metallic elements which are most often used as reference elements are Fe (Kartal et al. 2006), Al, Li, and TOC (Zhang et al. 2007; Gu et al. 2012; Li and Feng 2012a). In this study, we did not analyze that Al, Li, and TOC concentrations in the soils. We used Fe as a conservative tracer to differentiate natural from anthropogenic components (Kartal et al. 2006). In fact, several authors have successfully used Fe to normalize heavy metal contaminants (Bhuiyan et al. 2011; Esen et al. 2010; Gu et al. 2012). The EFG is defined as follows:

$$ {\mathrm{EF}}_{\mathrm{G}}={\left(\mathrm{Me}/\mathrm{Fe}\right)}_{\mathrm{sample}\ \mathrm{map}}/{\left(\mathrm{Me}/\mathrm{Fe}\right)}_{\mathrm{background}\ \mathrm{map}} $$
(16)

where (Me/Fe)sample map is the map of metals to Fe map ratio in the samples of interest; (Me/Fe)background map is the map of geochemical background value of metals to Fe map ratio.

Result and discussion

Descriptive statistics

A summary of the basic statistics of the investigated heavy metals and soil properties is presented in Table 1. It was noted that the K–S test for soil properties showed normal distribution, with the exception of Cd, V, EC, and OM, and thus the raw datasets were logarithmically transformed before performing geostatistical analysis. Logarithmic transformation resulted in reduced skewness and kurtosis values of Cd, V, EC, and OM, and the transformed datasets passed the log-normal tests. The geomean values of the heavy metal contents arranged in decreasing order were: Cr > Zn > Ni > Cu > Pb > Co > As > V > Fe > Cd (Table 1). The micronutrients such as Fe were present at the lowest levels in soils, whereas Cr and Zn were present at higher values. The coefficients of variations of EC, OM, Cd, As, clay, and sand were higher than 50 %, which implies that those metals had greater variation among the soil samples and thus were possibly influenced by extrinsic factors such as human activity. The average soil pH of 7.78 indicated that the soil in the study area was alkaline.

Table 1 Descriptive statistics of soil properties and heavy metals

Multivariate analyses

Heavy metals, soil properties, and land use grouping using AHC and Gs

AHC and Gs was used on both datasets of variables and land use sites, in order to identify clusters of land use, soil properties, and metals. AHC and Gs as an iterative classification method enabled the identification of three main groups (Fig. 2a), which included: group A (agriculture, rocky, and urban), group B (rangeland and orchards), and group C (water). The integration of AHC and Gs were performed on heavy metals and soil properties based on these three major groups obtained through AHC and Gs performed earlier (Fig. 2a). Results of AHC performed on groups A, B, and C are depicted in Fig. 2b–d. The dendrogram representing group A displayed three clusters (Fig. 2b). In the first cluster, As, pH, and sand were very well associated with each other (A-1 in Fig. 2a). The second cluster comprised of Cd which was also linked with Cu, EC, and OM. Soil properties (silt and clay) were associated in the second cluster. In the third cluster Co, Cr, Ni, Pb, V, Zn, and Fe were very well linked with each other. Heavy metals (As, Cd, and Cu) in the first and second cluster appear to be associated with anthropogenic origin. Fe is usually present in soils in relatively high concentration under natural condition. Co, Cr, Ni, Pb, V, and Zn may represent mixed origins (anthropogenic and lithogenic). In the group B sites, As, Cd, Cu, Pb, pH, EC, OM, and clay are grouped together, while Fe, Co, Cr, Ni, V, and Zn, and sand and silt formed subgroups B-2 and B-3, respectively. In group C, As, Cd, Co, Cr, Zn, Fe, pH, EC, and OM formed a separate group, although also linked with Cu, Ni, Pb, V, and soil properties (sand, silt, and clay).

Fig. 2
figure 2

Dendrogram showing clustering of a land use, b heavy metals and soil properties in group A, c group B, and d group C land use

Discriminating natural and anthropogenic sources using PCA

PCA (VARIMAX rotation mode) was performed separately for the three different groups (A, B, and C), as delineated by AHC and Gs techniques, to compare the compositional pattern between analyzed soil properties, metal samples and identify the factors influencing each one. PCA of the three datasets yielded three PCs for group A sites and three PCs for group B and two PCs for the group C sites with eigenvalues >1, explaining 63.6, 65.6, and 96.8 % of the cumulative variance, respectively (Table 2). In order to have a clear visualization of the data trends and the relationship between variables, PCA was depicted with loadings and score plots were derived. For datasets related to group A sites, PC1 elucidated 36.24 % of total variance and had strong positive loadings on Co, Cr, Ni, Pb, V, Zn, and Fe (Table 2). There were significant relationships between Co, Cr, Ni, Pb, V, Zn, and Fe. The geological structure of the area suggests lithogenic sources of these heavy metals. Mico et al. (2006) assessed heavy metal sources in soils of the European Mediterranean area, and observed that the metals Co, Cr, Zn, and Fe were in PC1 with maximum variance. These components were considered as lithogenic components and the variability of these metals is determined by the parent rocks.

Table 2 Principal component matrix with eigenvalues, variability, and cumulative spatial variations of variables and factors

PC2, which explained 15.62 % of the total variance, had strong positive loading for EC, OM, silt, and clay, and a strong negative loading for sand. This component, accounted for soil organic matter and electrical conductivity. PC3, which explained 11.7 % of total variance, had strong positive loadings for Cd and Cu, a strong negative loading for As, and a moderate loading for pH (Table 2). Main anthropogenic sources of these heavy metals (Cd, Cu, and As) are direct discharges from local point sources, such as industrial and urban discharges carrying metal contaminants. Industrial sources include chemical, petrochemical, and agricultural fertilizer industries, several mining (iron, copper, arsenic, cadmium, stone, etc.) activities, and agricultural activities involving the use of chemical fertilizers and municipal waste. In the group B dataset, PC1 was dominated by Co, Cr, Ni, Pb, V, Zn, Fe, EC, and clay, which accounted for 34.01 % of the total variance. The possible sources could be similar to those mentioned earlier. PC2, which explained 17.5 % of total variance had strong positive loadings on As and sand, and a strong negative loading on silt. PC3, which explained 14.08 % of total variance, was dominated by Cd, Cu, soil organic matter, and electrical conductivity (Table 2). PC1 metal loadings are attributed to natural sources. In group C, PC1 which explained 89.95 % of the total variance had strong positive loadings for Cr, V, Zn, and silt, and strong negative loadings for As, Cd, Co, Fe, pH, EC, OM. and clay. PC2, which explained 6.85 % of the total variance showed strong loadings for Cu, Ni, Pb, and sand.

PC1 components may be controlled by natural factors, while PC2 components (Cu, Ni, and Pb) as mentioned earlier may have different sources of origin. It was also evident that the results of PCA support the findings of the AHC and correlation analysis.

Spatial data

The geology of the area is characterized by quaternary alluvium, orbitalin lime, shale, and marls. The soils of this region were shallow and semi-deep, with gravel and lime. The texture of these soils was light to medium (Fig. 3a). Background concentration of heavy metals in virgin areas showed that the mean concentration of V and Cr was naturally high in all parent rocks and the lowest concentration in bedrocks was for Cd (Table 3).

Fig. 3
figure 3

Geology (a) and land use (b) maps of the study area

Table 3 Background concentration of metals in parent rocks of study region (mg kg-1)

A total of six land use classes (agriculture, orchards, rangeland, rocky, urban, and water) were recognized (Fig. 3b). Results of accuracy assessment of land use maps are summarized in Table 4. It presents the errors of commission (user’s accuracy), omission (producer’s accuracy), and overall accuracy. The error of omission is the proportion that is incorrectly identified as belonging to a particular class. The overall accuracy gives the proportion of correctly classified areas relative to the total validation area. The overall accuracy was found to be 89.5 %. When interpreting this number, one has to bear in mind the classification process and the six different types that were distinguished. Highest accuracies were obtained for agriculture (92 %), rocky land (87.6 %), and rangeland (86.6 %), while moderate accuracies of more than 70 % were obtained for orchards, urban areas, and water. Most of the classification error was associated with water and urban areas. Only 70.3 % of water classes were correctly classified, while 13.2 % were misclassified as being agriculture. Urban was misclassified as rangeland or agriculture.

Table 4 Error Matrix for individual classifications of land use (errors of commission: user’s accuracy, omission, producer’s accuracy and overall accuracy)

Geostatistical analysis

Spatial distribution patterns

The range of the semivariograms for Ni, Cr, and As were much greater than those for V, Pb, and Fe (Table 5). The smallest range of the semivariogram was presented by V and was 20.74 km. This confirmed the rational for the sampling density chosen, which was with 5-km intervals for the precise environmental survey of the heavy metals tested in this study.

Table 5 Best-fitted semivariogram models of heavy metals and their parameters

The spatial correlation between the available data with disjunctive kriging technique was used to map the metal contents and delineation of polluted areas. The experimental semivariograms of the heavy metals with the fitted models are presented in Table 5. The results showed that soil Zn data was fitted well with the rational quadratic model; Cd and Pb were fitted with the spherical model; Cu, Ni, and V were fitted with the circular model; and the other four heavy metals were all best fitted with the exponential model. The attributes of the semivariograms for each heavy metal are also summarized in Table 5. All of the Nug/Sill ratios for the ten metals were less than 0.72. In theory, the Nug/Sill ratio in geostatistics is regarded as the criterion to classify spatial dependence of soil attributes. The ratios of 0.25 and 0.75 are two thresholds for the relative strength index of spatial correlations. The variable with a ratio of less than 0.25 is strongly spatial dependent, while the variable with the ratio of between 0.25 and 0.75 is moderately spatial dependent; whereas the variable with the ratio greater than 0.75 is only weakly spatial dependent (Shi et al. 2007).

The Nug/Sill ratios of both V and Fe were less than 0.25, showing strong spatial dependence due to the effects of natural factors such as parent material, topography, and soil type (Table 5). The corresponding ratios for As, Cd, Co, Cr, Cu, Ni, Pb, and Zn were however, between 0.25 and 0.75, which belonged to the scope of moderate spatial dependence, revealing that anthropogenic factors such as industrial production, fertilization, and other soil management practices had changed their spatial correlation after a long process of utilization.

The spatial patterns of the ten heavy metals generated from their semivariograms showed that all metals had distinct geographical distributions (Fig. 4). The spatial distribution maps showed dissimilar geographical trends, with high contents both in the northeast and southern areas.

Fig. 4
figure 4figure 4

Spatial concentration maps of soil As, Cd, Co, Cr, Cu, Ni, Pb, V, Zn, and Fe produced by disjunctive Kriging

Geoenrichment factor and pollutant assessment

Unlike the application of multivariate techniques to the three major groups (A, B, and C), geoenrichment indices were calculated separately for all six land use classes, to delineate the polluted areas. The EFG values were interpreted in terms of heavy metal pollution as suggested by Birth (2003) and later adapted by Chabukdhara and Nema (2012). Birth proposed seven classes of enrichment (Table 6). Soils in the six land use classes in Hamedan showed a wide range of metal enrichment. In general, the mean predominant values showed moderate enrichment for As and Cd, and minor enrichment for Co, Cr, Cu, Ni, Pb, V, and Zn. Overall, the EF values of these metals followed the sequence Cd > As > Pb > Cu > Zn > Ni > Cr > V > Co. Similarly, Ghrefat et al. (2011) and Chabukdhara and Nema (2012) had reported EF of Cd to be the highest among the metals in the sediments of Kafrain Dam, Jordan and in the Hindon River, India. From the pollution point of view, agriculture and orchard lands were highly polluted and showed moderately severe enrichment for As and Cd, but with minor enrichment for Co, Cr, Cu, Ni, Pb, V, and Zn. The order of land use in terms of metal enrichment was: agriculture > orchards > rocky > rangeland > urban > water. None of the land use was absolutely free from anthropogenic enrichment.

Table 6 Enrichment factor (EFG) of metallic elements in soils under different land use

The heavy metal pollution maps plotted based on reference values in Table 6, showed that As and Cd exhibited moderately severe pollution risk. The pollution patches developed from EFG are presented in Fig. 5. For soil As, the northeast of Kabodarahang was the area with moderate to severe pollution, with levels reaching 3.06–5.85 mg kg−1. Overlaying As contamination and land use maps revealed that agricultural lands with the geological structure containing magmatic and metamorphic rocks, shale, and marl had high contamination. The areas with high concentrations of As in red patches are illustrated in Fig. 5. Soil Cd in general exibited moderate to moderately severe pollution, with similar pollution patches, but was more serious than that of As, and covered the wider southeast region of Hamedan. Besides Hamedan, the highest risk (of minor pollution) with Co pollution was distributed in the north and northwest of the studied region (Kabodarahang and Razan town). The EFG values for Cr, Cu, Ni, V, and Zn were greater than 1 over a wide range of the study area, indicating minor pollution largely from human activities. The EFG values for Pb was higher than 1 in a major portion of the study area, revealing significant Pb contamination. Thus, areas with high heavy metal contamination would contribute to the environmental pollution and ultimately threaten the health of humans and other living organisms.

Fig. 5
figure 5figure 5

Spatial distribution map of heavy metals pollution (As, Cd, Co, Cr, Cu, Ni, Pb, V, and Zn)

Critical values of soil pollution by Cd, As, Cu, and Ni have been reported by Kabata-Pendias (2004). In the most parts of the study area, the soil concentrations of these metals were higher than critical levels, but the presence of lime in the soil will decrease the solubility and bioavailability of the metals. Elsewhere in the region, the low levels of metal contamination indicate that these areas can be regarded as safe areas.

Conclusion

The article is an original contribution as a case study, as it is a study of the Hamedan county (Iran) where the soils contamination situation has not been previously studied. The present study was conducted to: (1) determine the spatial patterns of heavy metals in soil under different geological structures and land uses collected from Hamedan county; (2) identify their natural or anthropogenic sources by integration multivariate, gap statistics, and GIS; and (3) assess the level of heavy metal contamination in the topsoil based on EFG. The methods followed in order to reach the aims of the study were basic statistical parameters calculation, agglomerative hierarchical clustering, gap statistic, correlations matrix, principal components analysis, to characterize the spatial distribution, and trace the sources of metallic element and finally an integrated GIS for interrelation spatial patterns of pollution patches. Based on the increases in their variation rates in soil, heavy metals can be listed as Cd, As, Ni, Cu, Cr, Zn, Pb, V, Co, and Fe. The results of the present study using multivariate and gap statistical analysis suggest antrhropic origin of As, Cd, and Cu, and lithogenic origin of Co, Cr, Ni, Pb, V, Zn, and Fe. The results of the geostatistical approach confirmed the severity of pollution and its anthropogenic influence based on spatial variation in the level of contamination. Comparing the spatial distribution of geo-enrichment indices to auxiliary GIS layers (geology, land use, industrial, mines, and background concentration of metals) suggests that Cd, As, and Cu far exceeded the safe limit in most of the land use classes. Both natural factors (e.g., the high soil shale, sandstone, limestone, and metamorphic parent rocks and background values) and anthropogenic factors (e.g., the discharge of industrial wastes, the exploitation of mines special mineral ores, and the high and unmanaged uses of fertilizers; 500–700 kg ha−1 year−1 urine, 200–330 kg ha−1 year−1 potassium, and 300–558 kg ha−1 year−1 phosphorus fertilizers) had contributed to the genesis of the pollution process. Although almost all the monitored land use classes suffered from heavy metal contamination, agricultural lands was the most polluted. This information will be helpful to land use planners and environmental risk managers who seek to encourage responsible, environmentally friendly economic development strategies.