Introduction

The unreasonable utilization of natural (including land) resources by humans caused numerous ecological and environmental issues. Human and anthropogenic activities, such as excessive use of fertilizers, pesticides, sewage irrigation, and discharge of waste affected soil environment significantly. A standout amongst the most polluted drain by anthropogenic activities in Egypt is Bahr El-Baqar (Omran and Abd El Razek 2012). Saad (1997) concluded that 58% of the total drainage water of Bahr El-Baqar drain originate from agricultural drainage, 2% from industrial drainage and 40% from domestic and commercial drainage. Bahr El-Baqar drain conveys the greatest part of untreated wastewater into Lake Manzala through a very densely populated area passing through Qalubyia, Sharkia, Ismailia and Port Said Governorates. The soils of Bahr El-Baqar get numerous kinds of pollutants particularly heavy metals such as lead, cadmium, nickel, and mercury, which are considered the most hazardous (Park and Shin 2006). Soil heavy metals have been an exceptionally helpful indicator of environmental and soils quality, which affects the physico-chemical characteristics of soil ecosystems. Although conventional methods of soil heavy metals determination are sufficiently accurate, they are mostly based on wet digestion of soil samples in hot concentrated acids followed by atomic absorption spectrometry or inductively coupled plasma spectrometry. Therefore, these methods are time-consuming, expensive, and require environmentally damaging chemicals and qualified staff (Omran 2016). The cost for the expense for conventional soil characterization figured by the U.S. National Soil Survey Center, is about US$2500 per pedon and requires 6–12 months to be accomplished (Brown et al. 2006; Rizzo et al. 2014). In this way, development of alternative environmentally friendly method that is accurate, rapid, and inexpensive is of great value.

Visible-near infrared-short wave infrared (VNIR-SWIR) reflectance spectroscopy is relatively less expensive and faster than traditional wet chemical measurements (Summers 2009). The VNIR-SWIR spectral wavelengths (e.g., 350–2500 nm) act as predictor variables, and the soil heavy metals are predicted by a statistical (e.g., partial least squares regression, PLSR) model. Two aspects can largely influence the VNIR-SWIR based determination of soil heavy metals. One is the spectra transformation/pre-processing technique, while the other is soil sample preparation (Stenberg et al. 2010). On one hand, some researchers have looked at the spectral curves of soil samples with various surface roughness and moisture conditions (Wu et al. 2009). The investigation of such changes could be useful as the VNIR-SWIR spectra are known to carry comprehensive information of both chemical and physical phenomena. On the other hand, the spectral pre-processing technique has been considered an essential part of chemometrics modelling. To date, few studies, however, have assessed the data distribution of each wavelength after different spectral transformation, despite the important role that this plays in the correlation analysis between spectra and soil heavy metals. Moreover, its application permitted the baseline shift, light scattering and non-linearities to be largely removed (Rinnan et al. 2009). Unfortunately, VNIR region are dominated by powerless overtones and combination of vibration bands of atoms with strong molecular bonds containing nitrogen, oxygen and carbon attached to hydrogen and no absorption metals were found in this region. There vibrational modes are modulated by forming complexes with organic molecules containing C–H, N–H and O–H bonds (Malley and Williams 1997).

Heavy metals as soil contaminants commonly occur in low concentrations (10,000 mg kg−1 dry soil). Inorganic material as a rule does not show characteristic absorption features in the VNIR-SWIR wavelength region (Wu et al. 2005; Schwartz et al. 2012; Pandit et al. 2010; Vohland et al. 2009; Zhang et al. 2010; Yaolin et al. 2011). It is therefore considered difficult to directly detect inorganic soil contaminants in general, and heavy metals in particular (Winkelmann 2005). Peng et al. (2013) inferred that the impact of organic matter (OM) on quantitative estimation of the iron oxide content in soil is critical, especially when OM content exceeds 20 g kg−1. Therefore, the variation in OM content has a vital impact not only on the quantitative estimation of OM itself but also on that of other soil properties. When OM content exceeded 20 g kg−1, it is vital to adopt effective techniques to mitigate this effect (Peng et al. 2013).

So, a new concept “inference approach” in light of key soil constituents that absorb VNIR-SWIR radiation would became a powerful tool to enhance the accuracy of heavy metal prediction. While pure metals do not absorb in the VNIR-SWIR region, they might be detected indirectly through their complexing with OM, association with moieties such as hydroxides, sulfides, carbonates, or oxides, which are detectable, or adsorption to clays that absorb light in this wavelength range (Schwartz et al. 2012). Therefore, the objective of this study is to propose an inference model for heavy metals prediction (Cd, Cu, Pb, Zn, Ni, Mn, Cr, Fe) with VNIR-SWIR range in soil samples collected at Bahr El-Baqar region, Egypt. The present study is gear towards study the feasibility of different pre-processing VNIR-SWIR transformation for soil heavy metals inferences. The present study endeavors to answer two main questions: (1) is it possible to implement VNIR-SWIR spectroscopy as alternative, fast and cost-effective method for soil heavy metals prediction under low concentrations? and (2) is it possible to establish successful calibration models for heavy metals inference with small number of samples collected at farm scale?

Reflectance spectra of heavy metals

Inorganic material such as heavy metals, which do not show characteristic absorption/reflection features, cannot be directly detected in the VNIR-SWIR region (Schwartz et al. 2012; Pandit et al. 2010; Zhang et al. 2010; Yaolin et al. 2011). Schwartz et al. (2012) proposed that soil contamination may be detected indirectly via association with moieties, which are detectable in the VNIR-SWIR region. Prediction of soil heavy metals from spectra, may be detected indirectly through primary correlations to well recognized absorptions (e.g. those for clay minerals, OM, or iron oxides) and those that rely on secondary correlations with the soil constituents that absorb VNIR-SWIR radiation (such as pH, cation exchange capacity). Various spectroscopic techniques for evaluating heavy metals contamination have been identified (Wu et al. 2007; Choe et al. 2008; Minasny et al. 2009; Pandit et al. 2010). Table 1 summaries the previous results of soil heavy metals prediction via VNIR-SWIR spectroscopy.

Table 1 Summary results of soil heavy metals prediction via VNIR-SWIR spectroscopy

In order to accurately predict soil contamination via a spectral signal, it is first necessary to understand the relationships between spectrally active constituents, which have direct and significant effect on the reflectance of heavy or trace metals (Wu et al. 2007; Minasny et al. 2009). OM, clay minerals, and iron oxides were principally responsible for predicting a range of heavy metals (Cd, Cu, Fe, Mn, Ni, Pb and Zn) based on multivariate regression. Zhang et al. (2010) has suggested that heavy metals such as Pb can be directly detected and can also be estimated indirectly using reflectance spectroscopy via correlations with iron oxide contents. Kemper and Sommer (2002) found a strong relationship between iron and heavy metals concentrations, including Pb, when utilizing multiple linear regression analysis (MLR) and an artificial neural network. While Wu et al. (2007) found that Fe had a high correlation and high cross validation statistic with heavy metals, Vohland et al. (2009) found a strong correlation between Pb and other heavy metals and Fe and clay fractions. Yaolin et al. (2011) found good correlation between OM, Pb concentration and laboratory based data in the range of 564–624 nm. Moreover, there was a significant relationship between metal levels (Cd, Hg, Pb) within the spectral domain 500–700 nm, which was related to OM (Bray et al. 2009; Ji et al. 2010; Pandit et al. 2010). Some authors reported good predictions of Cd, Co, and Zn (Wu et al. 2007) while others reported poor predictions (Wu et al. 2007; Malley et al. 2004). Contradictory results have also been reported for Cr, Cu, Ni and Pb (Malley et al. 2004; Wu et al. 2007). Figure 1 shows the main characteristic spectral signatures and corresponding soil attributes and provides selected major spectral regions for some soil properties.

Fig. 1
figure 1

Reflectance spectra of soil sample collected from Bahr El-Baqar region indicating the spectral features of the most important constitutes

Materials and methods

Study area, soil sampling, and analysis

Environmental and land resources in Bahr El-Baqar region are faced critical problems due to the increasing population, demolishing natural resources, heavy metals pollution, as well as others. A total of 35 soil samples was collected in 2014 from the study area which is located in northern Egypt, Bahr El-Baqar region, between 32°05′ to 32°16′ Longitude and 30°56′ to 31°07′ Latitude (Fig. 2). As few as 25 samples can result in good predictions at the farm or field scale (Wetterlind et al. 2008). One hundred to 200 samples might be at the lower limit for a model intended to cover a large geographical area with many and diverse soil types (Wetterlind et al. 2013). Outliers were examined and one samples were removed. 34 samples remained for soil heavy metals modeling and analysis. The samples were air-dried and crushed with a wooden pestle and mortar to pass through a 0.5 mm mesh-screen.

Fig. 2
figure 2

Satellite image and sample locations for the study area (Omran 2016)

Nine metals are selected based on their: toxicity/environmental context, relation to local geology, irrigation activities, and data completeness and variability. The selected elements are lead (Pb), zinc (Zn), manganese (Mn), iron (Fe), cadmium (Cd), copper (Cu), chrome (Cr), cobalt (Co), and nickel (Ni). The acidity (pH) and soil salinity are also to be considered with the aforementioned parameters since they are important environmental indicators. Clay content and OM are selected as important parameter for indirectly prediction of heavy metals. Soil pH was measured in deionized water (pHw) and in 0.01 M CaCl2 (pHCaCl2) (in 1:2.5 suspensions). Total organic carbon (TOC) was determined by wet digestion (Walkley and Black 1934).

Overall methodology

The overall methodology developed includes the following steps: (1) Set-up of a representative sample set from soil sampling strategy and chemical analysis. (2) Analysis of the samples following standard procedures. (3) Elaboration of a procedure for soil sample preparation before spectroscopic measurements. (4) Division of sample set into three groups: training, validation and test. The relationship between the soil heavy metals and the spectroscopy data is found via the training group and simultaneously cross-validated by the validation group. Finally, the model is applied to the test group, independent of the training and validation process. (5) Elaboration of a predictive model for each soil heavy metals (spectral data pre-treatment and calibration). (6) Evaluating the repeatability and reproducibility of the method. Figure 3 shows the overall methodology developed to achieve the study aim and build the most accurate models for predicting soil heavy metals.

Fig. 3
figure 3

The overall methodology used to predict heavy metals of Bahr El Baqar soils using spectroscopy and chemometrics technique

The original soil reflectance spectra consisting of 2151 values (at 1 nm interval) between 350 and 2500 nm were averaged at every tenth nanometer wavelength. The new spectrum consisted of reflectance values at 350, 360, 370,… 2500 nm, inclusive. For example, the reflectance at 400 nm was an integrated reflectance values from 396 to 405 nm. This procedure resulted in a new dataset with 214 wavelengths, that was approximately 1/10th the original data volume, while maintaining sufficient spectral information.

Pre-processing and transformation methods

Several pre-processing techniques were utilized to reduce the noise and to enhance the spectra or both. Scatter- corrective methods and spectral derivatives are the most widely used pre-processing techniques in spectroscopy (Rinnan et al. 2009). Scatter-corrective methods incorporate multiplicative scatter correction (MSC), inverse MSC, extended MSC (EMSC), extended inverse MSC, de-trending, standard normal variate (SNV) and normalization. Because of scatter modify for baseline shifts between samples, these techniques are intended to diminish the (physical) variability between samples. MSC is likely the most widely used pre-processing technique followed by SNV and derivation. Each spectrum is then corrected such that all samples seem to have the same scatter level as the reference spectrum (Vohland et al. 2009). Martens et al. (1983) was proposed to utilize just those parts of the spectral axis that do not include relevant information (baseline). It is difficult to decide such regions in practice, especially in near infrared (NIR) measurements, where the signals from different chemical components are strongly overlapping and correlated, and little or no true baseline is found. This is the motivation behind why, in most cases, the entire spectrum is utilized to locate the scalar correction parameters in MSC. SNV pre-processing is probably the second most applied method for scatter correction of the data (Barnes et al. 1989). MSC and SNV are the same for most practical applications.

The spectral derivation group is represented by Norris-Williams (NW) derivative and Savitzky-Golay (SG) polynomial derivative. Derivatives have the ability to eliminate both additive and multiplicative effects in the spectra. Vasques et al. (2008) found that derivative procedure has the best transformations in pre-processing. Derivatives of the spectra may allow relevant information to be extracted from the near-infrared range (Ertlen et al. 2010). Both derivation techniques use smoothing in order not to reduce the signal-to-noise ratio in the corrected spectra too much. The parameters of the algorithm (interval width, polynomial order) ought to be carefully chosen to avoid amplification of spectral noise (Nicolaï et al. 2007). In order to find the derivative at centre point, a polynomial is fitted in a symmetric window on the raw data. When the parameters for this polynomial are calculated, the derivative of any order of this function can undoubtedly be discovered analytically, and this value is subsequently utilized as the derivative estimate for this center point. The number of points used to calculate the polynomial (window size) and the degree of the fitted polynomial are both decisions that should be made. Too small window (moving-window) will lead to the introduction of large artifacts in the corrected spectra and to a reduced signal-to-noise ratio. However, the larger the size of the window, the smaller the distinction between full and moving-window pre-processing.

Pre-processing methods, such as MSC, SNV, Savitzky-Golay, continuum removal (CR) and derivatives are mostly used in the multivariate calibration techniques to decide the best data (Gholizadeh et al. 2015a). The CR will minimize any brightness differences and emphasize the spectra’s absorption bands. Spectroscopic measurements were performed in transmission mode, which can be quantified (to absorbance) using Beer’s Law (Nicolaï et al. 2007). If the dataset does not follow Lambert–Beer’s law (a linear relationship between the absorbance and the concentrations), additional factors in PLS regression can compensate for this non-ideal behavior of the spectral predictor (Martens and Næs 1989). The disadvantage of including such additional factors is an increase in model complexity and, in turn, most probably a reduction of the model robustness for future predictions. The utilization of combinations of pre-processing methods is abundant in literature (Rinnan et al. 2009).

Soil heavy metals prediction

The application of VNIR-SWIR spectroscopy for soil heavy metals prediction does not involve any (hazardous) chemicals. It is a non-destructive technique well suited for analyses of some of the essential constituents of the soil (Wetterlind et al. 2013). There are two approaches to predict heavy metals in soils: (1) supervised in which the chromophore absorption band in question is known and under saturation, and (2) unsupervised in which the chromophore absorption band is unknown and is affected by several factors. Whereas the first approach is valid for the infrared (IR) region by using selected wavebands and pre-treatment to prevent saturation, the second represents the case of reflectance spectroscopy across the VNIR-SWIR region. For that purpose, multivariate statistical techniques (chemometrics) are required to extract the information about the quality attributes that is hidden within the spectral information. Essentially, this involves regression techniques coupled with spectral pre-processing.

Pearson correlation coefficient was used to identify correlations between the selected heavy metals (Cd, Cr, Cu, Fe, Mn, Ni, Zn, Co, Pb), iron oxides, clay and OM. Multivariate analyses of the soil spectral data were performed using principal component analysis (PCA) and partial least squares regression (PLSR). PCA identifies patterns in spectral data, allowing the clustering of data and reducing the number of dimensions without the loss of information. PLSR is a technique used in chemometrics (e.g. Conforti et al. 2013) when there are many predictor variables, which are highly collinear. PLSR is based on latent variable decomposition of two sets of variables: the independent, X (spectral reflectance bands) and the dependent variables, Y (heavy metals). The limited number of PLSR factors selected explains most of the variation in both predictors and responses. PLSR is thus a method where the information is commonly explained by the original X variables that is compressed into few variables known as PLS components (latent variables) (Conforti et al. 2013).

PLSR analysis was carried out using The Unscrambler v. 10.4 software (CAMO Software Inc., Woodbridge, NJ, USA). The soil analysis data determined by conventional analysis methods were added to the corresponding spectral files using the WinISI II (Ver. 1.04) software (FOSS NIRSystem/TECATOR, 1999). Matlab (version 8.1.347) and PLS-Toolbox 4.2 were used to fit PLSR calibration with leave-one-out cross-validations. The ENVI 5.3 software (Exelis Visual Information Solutions 2012) was used to perform the continuum removal. TSG software (http://www.thespectralgeologist.com), was used for the visualization and also the quantification of this variability.

Goodness-of-fit

The following statistics was used to assess the prediction ability of calibration and validation models including, the coefficient of determination in calibration (R 2c) and validation (R 2v), the standard error of cross validation (SECV) in calibration, the standard error of prediction (SEP) in validation. The satisfying the criteria for establishing the model where R 2 > 0.70 (Morón and Cozzolino 2003). The prediction capacity of the models was assed using the ratio performance deviation (RPD) parameter or standard deviation reference of validation (Conzález-Martín et al. 2007). The residual prediction deviation (RPD) in calibration (RPDc) and validation (RPDv) models (Pirie et al. 2005). The goodness-of-fit of the prediction inference model is determined by the R 2 and RMSEP values obtained from plotting observed concentrations vs the predicted concentrations for each variable. The RMSEP value is the standard deviation of the predicted residuals error and is computed as the square root of (observed-predicted)2/N (Umetrics 2002). According to Viscarra Rossel (2008), the RPD values were classified in the following way: RPDs between 1.5 and 2.0 indicate poor predictions; RPDs between 2.0 and 2.5 indicate good prediction and RPDs >2.5 very good/excellent model prediction.

Results and discussion

Heavy metal properties of the Bahr El Baqar soils

Errors related to the traditional chemical analysis to which the spectra are correlated will be included in the calibration model. Therefore, a statistical analysis of the soil data before using it for calibration is important. The mean, standard deviation, and range for each heavy metal are presented in Table 2. The pH average in Bahr El Baqar soils is about 8.00, because of the presence of calcareous parent material. The relatively high content of soil organic matter (average 1.6%) is mainly related to the high organic matter flux to soil due to direct discharge of domestic and industrial wastewaters. The dissolution of humic acid at high pH is responsible for dissolution of Cu and Pb from soil. Organic matter is important for the retention of metals by soil solids, thus decreasing mobility and bioavailability.

Table 2 Summary descriptive statistics of representative physico-chemical properties of the Bahr El Baqar soils

The summary statistics results indicate that the mean value of soil Pb, Cu, Cr and Zn concentrations is 36.64, 65.70, 106.96, and 90.56 mg kg−1, respectively and the range between the minimum and maximum values is 27.98, 272.58, 49.14, and 175.91 mg kg−1, respectively, which is large. There are great variations and high skewness for Cu and Zn (2.27 and 0.76). The CV% values reflect the mean variation of each sampling site in the population. The order of the CVs% for each element, from high to low, was Cu > Zn > Fe > Mn > Co > Cr > Ni > Pb > Cd. This result showed that the variation of Cu and Zn in the soil was larger than other metals.

Spectral signatures of Bahr El Baqar soils

Figure 4(left) shows the spectra of all soil samples measured in laboratory, which show the typical soil reflectance shape in each region of wavelength domain. Reflectance is lower in the visible range (350–650 nm) and higher in the near infrared with specific absorbance bands around 1400, 1900 and 2200 nm. A promising relationship between the heavy metal concentrations and reflectance spectra was determined for the samples. Figure 4(right) shows the Bahr El Baqar soils with different heavy metals concentration. The spectral signatures of the soil samples changed significantly as a function of heavy metals concentration. It was hypothesized that the samples with the highest concentrations would have the lowest reflectance, and that the reflectance would increase proportionally as the heavy metal concentrations decreased.

Fig. 4
figure 4

Reflectance of soil samples (left) and a spectrum with wavelength position of different heavy metals concentration (right)

To determine the relationship among soil parameters, Pearson correlation coefficients were calculated (Table 3). Table 3 shows the correlation coefficients of the heavy metals, clay, iron oxides, organic matter, and reflectance at 400, 800, 1200, and 1600 nm for the samples. The results revealed that the Fe is good correlated with Mn and moderately correlated with Cu and Pb (r = 0.819, 0.488, 0.479, respectively). Furthermore, there is a well correlation between Ni and Co (r = 0.966). There are significant correlations between Cu and each of Zn, Pb, (r = 0.678, 0.544, respectively) as shown in Table 3. This result revealed that these metals have the same source of contamination. The Ni, Fe, Mn, Cu, Zn and Co are positively correlated with OM, pH and clay. Clay highly correlated with Fe, Mn, Cu, and Zn (0.773, 0.658, 0.786, and 0.709, respectively). As can be seen in Table 3, there is a negative correlation between pH and each of OM, clay, Cu, and Zn since increasing the clay and OM leads to increasing the total acidity of these soils. By contrast, there is a positive correlation between pH and Cu and Zn. A significant relationship is found between Cd, Cu content and clay content. Considering all soil samples, Pb, Ni and Cr content is not significantly related to clay content.

Table 3 Correlation coefficients of metal concentrations, organic-matter content, clay, and reflectance (Ref.) at 400, 800, 1200 and 1600 nm

Overall, Table 3 shows that there is no direct correlations between heavy metals and reflection spectra at 400, 800, 1200, and 1600 nm. The linear relationship between heavy metal and reflectance is significant for 400 and 800 nm. However, this relation is not significant for 1200 and 1600 nm. Therefore, pre-processing spectra and selecting important wavelength from PLSR is important. Earlier research has shown that the calibration models, in which spectra are not pre-processed, are more sensitive to changes in environmental conditions compared to models for which pre-processing is applied (Moros et al. 2009). So one could still decide to use a pre-processing method to avoid this problem. Due to the numerous and high-correlated spectral data, band selection or band transformation is the prerequisite for heavy metal inference.

Validation the important wavelengths and pre-processing for PLSR

In this study, specific spectra measured in the laboratory were used for predicting metal concentration levels using different pre-processing methods (Table 4). The optimum calibration models were determined by the lowest standard error of calibration (SECV) and standard error of prediction (SEP) and the highest correlation coefficient (R 2). The ratio of standard error of performance to standard deviation (RPD) was used to evaluate the accuracy of prediction.

Table 4 Validation the most important and significant spectra range (nm) for heavy metals prediction

The use of PLSR for the prediction of the variables Ni, Cr, Pb, Zn, Cu, Cd, Mn, and Co based on the specific spectral value leads to the results mentioned in the Table 4. The pre-processing spectra of the samples were used to describe the chemical variables. All these chemical variables with exception of Cu, Cd, Co and Fe significantly have wavelengths for prediction in the following priority order: Mn > Pb > Zn > Cr > Ni according to the model quality (Table 4).

Mn, Zn, Pb, Cu, and Ni, had significant coefficients of determination 0.62, 0.66, 0.66, 0.60, 0.69, respectively for the validation samples, indicating that the partial least-squares calibration model has low predictive power to estimate metal abundance. Based on Table 2, Cu and Zn have larger variation (CV) than other heavy metals. Co, Cr, and Cd had low coefficients of determination (0.55, 0.52, 0.59, respectively) for the validation samples. Based on Table 4, specific spectral pre-processing increases the accuracy of all heavy metals. The application of most of the pre-processing methods for these heavy metals increased the accuracy of prediction, which was similar to the conclusion drawn by Ren et al. (2009).

Figure 5 shows validation results of the heavy metals concentration using different pre-processing methods on the spectra. According to the criteria of minimal RMSEcv and maximal R 2cv, the second derivatives method was considered as the foremost techniques for predicting Mn, Zn, Cd, and Cr, respectively. However, the CR pre-processing method was chosen as the best algorithm for Cu, Ni, and Co. The first derivatives method was considered as the foremost techniques for predicting Pb only. Although the prediction ability with the different pre-processing of specific spectra was improved, it is still under the satisfying threshold.

Fig. 5
figure 5

Validation and prediction results of the pre-processing models for the heavy metals concentration (in mg/kg)

Heavy metals inference using OM, Al2O3, and Fe2O3

Due to the interference effects of organic matter and low heavy metal concentrations for all soil samples (Table 2), the selection of the most sensitive original spectral bands did not result in highly accurate prediction (Table 4; Fig. 5). Organic matter masks spectral signatures, making it difficult to identify metal quantities at these wavelengths (Fig. 6). Therefore, it is important to predict heavy metals through OM, iron oxides (Fe2O3), or clays first. Figure 6 shows the heavy metals inference using OM, Al2O3, and Fe2O3.

Fig. 6
figure 6

Continuum removal of spectra for heavy metals inference using organic matter (OM), Al2O3, and Fe2O3

To identify the specific wavelengths for prediction of the studied heavy metals, the regression coefficient (Table 5) with OM, Al2O3, and Fe2O3 were used. The most significant peaks for Zn prediction are found in the wavelengths of 430, 570, and 1700 nm; for Cu prediction at 538, 1259, 1500, and 2184 nm; for Pb significant peaks are at 440, 578, 915, and 1894 nm; for Cr at 860, 1001, and 2376 nm; for Ni at 870, 909, and 2240 nm; for Cd at 630 and 1270 nm; for Co at 1900 and 2300 nm.

Table 5 The regression coefficients obtained with partial least squares regression analysis between heavy metals and Fe2O3, Al2O3, and OM

Absorptions at 1500 nm are connected with amine N–H vibrations in organic components (Tekin et al. 2012). Absorptions at 1900 nm are connected with absorption of H–O–H bending and O–H stretching vibrations in montmorilonite (Viscarra Rossel et al. 2006). Absorptions at 2170 nm are connected with Al–(OH) bands in albite and at 2240 nm with Fe–(OH) bands in iron minerals (Gaydon et al. 2009). The absorption near 2300 nm is connected with Mg–OH bending and O–H stretching vibrations in soil minerals, like chlorite, illite and vermiculite (Viscarra Rossel et al. 2006). These significant wavebands support the assumption that measurements of heavy metals with VNIR-SWIR are performed by co-variation with water, clay, organic carbon and some minerals in the soil (Wu et al. 2007).

The heavy metals have different correlation coefficients with spectra, but the strongest correlated bands differ and are indicative of their binding information. The heavy metals display different correlation features (Tables 5, 6) and can be divided into three groups. Group I has Cr, which is related to iron oxides, clays, and organic matter. Cr shows high spectral correlations at the 2376 nm spectral bands related to Fe (Mg, Al) –OH from iron oxides and clays, and C–H absorption from organic matter, reflecting its binding feature to all these fine soil constitutes. Group II has Ni and Mn, which is related to iron oxides and clays. Ni and Mn show high spectral correlations at the 870 and 2240 nm spectral bands, respectively. Group III has Cu, which is related to iron oxides. Cu has strong correlations with bands resulting from Fe3+ absorption at 538 nm. This suggest that binding to iron oxides is the most important form of Cu. Group V has Co, which is related to clays. Co shows high spectral correlations at the 1900 nm spectral bands. Group IV has Cd and Pb, which is related to organic matter. Cd and Pb only exhibit good correlation with reflectance (630 and 578, respectively) of the spectral regions between 500 and 700 nm, which are strongly related to the total carbon contents of the soils, suggesting that Cd and Pb are only strongly associated with organic matters, not with iron oxides or clays. Zn displays high correlations with spectral regions at around 1700 nm, which is related to the first overtone of C–H stretch from organic matter; thus binding with organic matter is the dominant form of Zn.

Table 6 The prediction equations obtained with regression analysis between the specific wavelengths and heavy metals utilized Fe2O3, Al2O3, and OM equations

For example: The Pb concentration can be predicted as in the following equations.

$${\text{Pb }}\left( {\text{ppm}} \right) = 0.0 7 7 5\, { \text{OM}} + 3 5. 2 8 6$$
(1)
$${\text{OM }}\left( \% \right) = 1. 3 4 4 + 0.0 1 9 7\, { \text{TS}}_{{{\text{crDOA6}}00}}$$
(2)
$${\text{Pb }}\left( {\text{ppm}} \right) = 0.0 7 7 5 { } \times 1. 3 4 4 + 0.0 1 9 7\, { \text{TS}}_{{{\text{crDOA6}}00}} + 3 5. 2 8 6$$
(3)

Therefore, the proposed “inference model” approach in light of key soil constituents that absorb VNIR-SWIR radiation would became a powerful tool to enhance the accuracy of heavy metal prediction. While pure metals do not absorb in the VNIR-SWIR region, they might be detected indirectly through their complexing with organic matter (OM), association with moieties such as iron oxides, which are detectable, or adsorption to clays that absorb light in this wavelength range (Table 6).

Validation some selected (Cd, Fe and Pb) heavy metals using XRD and SEM

The morphology and composition of the soil samples were examined by scanning electron microscopy (SEM) and x-ray diffraction (XRD). The samples can be divided into two groups, those with small (OM ≤3% e.g., soils 12, 22, 25) and those with large (OM >3% e.g., soils 5, 9, 15) TOC content (Table 2). Samples in the small OM group contain particles composed of Si, Al, Fe and small amounts of K and Ca. As the OM content increases, agglomerates of particles, characteristics of SEM (Méndez et al. 2009), have been identified. Samples in the large OM group have more compact and uniform aggregates (Fig. 7). They are formed of irregularly shaped particles with rounded edges. The XRD shows that the particles contain, together with Mg, Si, Al, Fe and K, elements from soil heavy metals such as Cd and Pb, which are visible in Fig. 7.

Fig. 7
figure 7

The morphology and composition of the soil samples, which have Cd, Fe and Pb examined by scanning electron microscopy (SEM) images and XRD

Calcium might result from calcite (CaCO3) in the samples. The presence of microaggregates (<250 μm) is visible in SEM images (Fig. 7). Aggregation is generally facilitated by CaCO3 and clays (Six et al. 2002), which the chemical analysis showed were both present in the samples. The high pH observed could also mean the presence of cations, such as Ca2+, which form bridges between clay and OM (Bronick and Lal 2005).

Conclusion

VNIR-SWIR spectroscopy is a straightforward and non-destructive analytical method that can be utilized to predict heavy metals, which are spectrally feature-less. Large concentrations of heavy metals can be predicted using reflectance spectroscopy. Heavy metal elements cannot be identified with reflectance spectroscopy at concentrations under 1000 mg kg−1. Therefore, it is harder to determine small concentrations of heavy metals in soils of Bahr El Baqar region. Organic matter masks spectral signatures, making it difficult to identify metal quantities at these wavelengths. The fingerprint region of 538, 578, 630, 870, 1900, 2240, and 2376 nm is very useful in recognizing small differences in the heavy metals of Bahr El Baqar region. A close match between several spectra in this fingerprint region constitutes strong evidence for the identity of heavy metals in the spectra. Several heavy metals (Fe, Cu, Cd, and Pb) have sharp bands between these regions. The majority of single bonds give rise to absorption bands within these frequencies and, due to similarities in their energies, strong interactions occur between neighboring bonds. PLSR is able to some extent to model heavy metals with laboratory spectra parameters. It was possible to predict heavy metals with moderate accuracy. The best coefficients of determination (R 2) between the real and predicted contents were for Mn, 0.62; Pb, 0.66; Zn, 0.66; Ni, 0.69; and Cu, 0.60. Mn, Pb, Ni, Cu, and Zn were successfully predicted using partial least squares regression, PLSR. To increase prediction accuracy, specific wavelengths were used with OM, iron oxides, and clay. Correlation analysis revealed that most important wavelengths for prediction could be attributed to the absorption features of OM, clay and iron oxides. Group I has Cr, which is related to iron oxides, clays, and organic matter. Group II has Ni and Mn, which is related to iron oxides and clays. Group III has Cu, which is related to iron oxides. Group V has Co, which is related to clays. Group IV has Cd and Pb, which is related to organic matter.

Due to the interference effects of organic matter and low heavy metal concentration, the selection of the most sensitive original spectral bands still did not result in highly accurate prediction. Therefore, it is important to predict heavy metals through OM, iron oxides, or clays first. To identify the specific wavelengths for prediction of the studied heavy metals, the regression coefficient with OM, Al2O3, and Fe2O3 were used. The most significant peaks for Zn prediction are found in the wavelengths of 430, 570, and 1700 nm; for Cu prediction at 538, 1259, 1500, and 2184 nm; for Pb significant peaks are at 440, 578, 915, and 1894 nm; for Cr at 860, 1001, and 2376 nm; for Ni at 870, 909, and 2240 nm; for Cd at 630 and 1270 nm; for Co at 1900 and 2300 nm.

These results indicate that it is possible to predict heavy metals in soils using the rapid and economic reflectance spectroscopy. For future investigations, hyperspectral sensor may be useful to optimize the estimation of heavy metals content at large scale. Future research likewise will focus on definitely establishing whether Cd and Pb contamination in soils can be assessed directly or indirectly in the VNIR-SWIR region using artificially soils spiked with Pb in the range 50–10,000 ppm. An urgent research need in Egypt is the development of soil spectral libraries that will improve the predictive ability of VNIR-SWIR spectroscopy for soil minerals and other soil attributes whatever the soil type. Coupling spectral libraries with other diffuse reflectance measurements of soils, such as mid-infrared reflectance spectra, will probably be the next step towards spectral sensing of soil heavy metals.