Introduction

Increasing consumer demand for high-quality olive oils has contributed to the growth of origin-labelled products, as provenance has long been considered one of the most important influences on olive oil quality (Longobardi et al. 2014).

For instance, the protected designation of origin (PDO), one of the geographical indication (GI) schemes regulated by the European Union (EU), attributes the unique properties and characteristics of a product to the geographical location in which it was entirely prepared, processed and produced (Pepi et al. 2017). Consequently, GI labels are considered to be among the most powerful marketing tools available, and products bearing such labels attract premium prices (Likudis 2016). Tunisia is currently aiming at this niche market, and is seeking to establish origin-labelled olive oils from potential GI areas (Laroussi-Mezghani et al. 2015; Mnasri-Rahmani et al. 2019).

The geographical origin of an olive oil affects its composition and quality through a combination of, and the specific balance between, natural and human factors such as climate, soil, underlying parent material, cultivation practices, and the experience and knowledge of the oil producer (Ranalli et al. 1999). This relationship between a foodstuff and its geographical origin is well represented by a term that has long been used in wine marketing: “terroir.”

Terroir studies have been performed in various fields and using a variety of approaches (e.g. spatial mapping, pedology, geochemistry, and geophysics) because the terroir is a result of dynamic interactions between a large number of physical factors (Wilson 2007). From a geological viewpoint, the terroir is defined as the geochemistry of the soil and water or as the food properties that can be affected by the climate, soil, geology, and cultivar used during ingredient cultivation (Pepi et al. 2017). From this perspective, terroir studies rely on the principle of biogeochemical cycling of chemical elements from the soil to the plant to the foodstuff (Pepi et al. 2017). However, until recently, this transfer of elements to olive oil was not well studied. Recent investigations have shown that the soil’s geochemical signature, which varies with the region considered, is transferred to the olive oil. This means that the levels of certain chemical elements in the oil depend on the region in which the olives were cultivated, thus potentially allowing us to discriminate between oils from different regions (Beltrán et al. 2015; Damak et al. 2019a). In addition, those studies found that the concentrations of major and trace elements in olive oils were affected by the geographical origin of the oil more than by the cultivar, which highlights the important role of the soil and its geochemistry in the mineral composition of olive oils (Benincasa et al. 2007; Beltrán et al. 2015; Damak et al. 2019a). Moreover, our previous pilot study of paired Tunisian olive oil and soil samples collected during the 2015/2016 harvesting campaign from four regions demonstrated, for the first time, the potential use of multiple elements in the oil as excellent geochemical markers. These markers are capable of discriminating between olive oils originating from different regional climates and (especially) geologies (Damak et al. 2019a). Nonetheless, to gain a better understanding of the relationship between the multielemental composition of the olive oil and its environment, and to demonstrate the applicability of multielements as markers of the geographical origin of the oil, correlation analysis between the geochemical composition of the soil from a particular region and the olive oil originating from that region was carried out. Our previous results, however, indicated that the correlation between the composition of the oil and the soil was poor, and that, amongst the 11 elements determined, only the levels of Zn and Pb in the oil were found to be significantly (negatively) correlated with their levels in the soil. This result indicates that the elemental profile of the oil cannot be predicted from the geochemical composition of the soil in its region of origin, and the negative correlation implies that other factors (ignoring anthropogenic causes) may also modulate the geochemical composition of the olive oil (Damak et al. 2019a). For instance, Greenough et al. proved that climate is an important factor in the solubilities and concentrations of elements, and thus their bioavailability to the grapes used to make wine (Greenough et al. 2010). For this reason, other environmental variables need to be investigated, including edaphic and climatic features. Edaphic variables include the bioavailable fraction of the inorganic elements of soil as well as ancillary variables such as the chemical characteristics of the soil—its pH, electrical conductivity (EC), organic matter content (OM) and inorganic carbon (in the form of calcium carbonate CaCO3) content, which are known to affect element bioavailability. Climatic variables include other ancillary factors that might affect the bioavailability, such as temperature, precipitation and humidity (Bakircioglu et al. 2011; Han et al. 2011; Shen et al. 2013).

Evidence of significant correlations between the multielemental composition of olive oil and its provenance factors could lead to the evolution of multielement-based geographical traceability models from chemometric qualitative (classification) models to spatial quantitative (element landscape) models, in a similar way to how tracing based on stable isotopes of carbon and oxygen led to the development of “isoscapes” (isotope landscapes) (Chiocchini et al. 2016). Such a development would be significant, as chemometrics is based on the classification of samples into predefined classes, so it is more appropriate when the olive oils are produced in a confined and limited geographical area (Oliveri and Simonetti 2016), whereas olive growing in Tunisia, for instance, is a large-scale agronomic practice (over 30% of the cultivated land in Tunisia is devoted to olives, and the olive orchards are spatially contiguous and constitute 1.7 million hectares of olive plantations; IOC 2017).

Taking all of the above into account, an extensive and almost exhaustive sampling campaign that aimed to perform measurements of a great number of Tunisian olive oil samples was conducted in December 2017. This campaign was designed and executed to achieve two main goals. The first goal was to extend the database of the multielemental compositions of Tunisian monovarietal olive oils to the major olive-growing regions. The exact locations of the olive orchards from which these olive samples were taken were recorded by a GPS to ensure confidence in the origins of the olives. This database is critical, as it can serve as a reference list of the inorganic compositions of olive oils from all the sampled regions. The second goal was to enhance and deepen our understanding of the relationship between the multielemental composition of olive oil and its environment.

Multielement-based geographical traceability techniques are based on the hypothesis that the mineral content of the soil in which the olives are cultivated dictates the multielemental composition of the resulting olive oil. Accordingly, the differences between the multielemental compositions of the soils from the regions to be discriminated should be large enough to also permit the olive oils from those regions to be discriminated. Therefore, it would be useful to identify the elements that characterize the geographical origin of the soil and could act as geochemical fingerprints. This study was a continuous effort to verify that hypothesis. Also, it is important to investigate the relationships of the multielemental composition of olive oil to environmental factors such as the climate and the soil chemical composition, which are considered important components of the terroir, taking our previous results and information in the literature into account.

Thus, the main objectives of this study were:

  1. (1)

    To characterize Tunisian olive terroirs from most of the olive-producing regions in Tunisia, some of which are reported here for the first time. This terroir characterization included, among other aspects, geochemical characterization of the soils and olive oils, which was important for geographical identification and to confirm our previous finding that multiple elements are suitable geographical markers provided that the geochemical composition of the olive oil depends significantly on its region of origin in Tunisia.

  2. (2)

    To investigate and quantify the relationships between the geochemical composition of the olive oil and its provenance factors—in particular the geochemistry of olive orchard soils, edaphic characteristics and climatic parameters—to demonstrate the utility and validity of using the multielement composition of the oil to verify its claimed geographical origin.

Materials and methods

Sampling

Area sampled, local geology, climate and sample collection

Eighty-six olive oils and 137 soil samples were collected in December 2017. Figure 1 shows the locations of the sampling points. The sampling covered a large area spanning the country from north to south and from east to west, and focused on the main olive-producing regions. The sampling locations were classified into 11 groups that correspond to the governorates of Tunisia: Nefza in Beja governorate (3 sites), Zaghouan (7 sites), Nabeul (8 sites), Sousse and Mahdia (a.k.a. the Sahel, 6 sites), Sfax (10 sites) including Kerkennah island (5 sites), Sidi Bouzid (8 sites), Gasserine (5 sites), Gafsa (6 sites), Medenine (the southeast, 9 sites) and Kairouan (2 sites—a total of 68 sites. The sampling was performed in Zaghouan, Nefza, Nabeul and Kerkennah in a less well-distributed manner than in the large Sahel and Sfax area, Gasserine, Sidi Bouzid and the south, where we attempted to include points from the four corners of each region in the sampling. However, the sampling was relatively sparse in all regions. During sampling, we tried to cover all possible variability in the bedrock in each region and include as many active farms as possible, but there were still lots of unsampled areas considering the enormous area sampled and the practical limitations on time and resources. However, our main objective was to achieve representative sampling from each olive-producing region.

Fig. 1
figure 1

Geological map showing the locations of all sampling points during the 2017/2018 olive harvesting campaign and the outcrops

Table S1 in the Supplementary Information lists the geographical origins (code of the sampling site, name of the governorate) and other information (olive cultivar and number of samples) for the olive fruit and soil samples. It also shows the geological setting (age and lithology of the underlying outcrop) deduced from the intersection of the point feature corresponding to the sampling point with the shapefile layers of the different outcrops on ArcMAP (ArcGIS, 10.4.1). Climatic features can have a significant effect on plant distribution and chemistry, and they can be introduced as mean values for a year or growing season (Han et al. 2011). Table S2 in the Supplementary Information shows the mean values of climatic parameters—including average annual temperatures (minimum, maximum and mean), annual precipitation and annual relative humidity—in each of the 11 regions sampled. These climatic data were obtained from the NASA Langley Research Center (LaRC) POWER project funded through the NASA Earth Science/Applied Science Program (see https://power.larc.nasa.gov/).

Sample preparation and analysis

Olive oil samples

Olive oil extraction

Each batch of olive oil was extracted from 1.5 kg of whole washed olive fruit using a solely mechanical method at the Sfax Olive Institute. A laboratory-scale two-phase oil mill was used, and extraction was carried out via the standard method employed in industrial olive oil mills. This method consists of crushing, malaxation for 30 min at ambient temperature, centrifugation without the addition of water, and finally natural decantation. This process yields an oily phase and a very wet pomace phase. Olive oil samples were stored in amber glass bottles at 4 °C until their inorganic element compositions were analyzed at the Advanced Analysis Center, National Agriculture and Food Research Organization (NARO), Japan.

Ultrasound-assisted extraction of multiple elements

The samples were prepared according to a method first developed by Camin et al. (2010) and subsequently adopted and well described by Damak et al. (2019b). Briefly, 10 g of an olive oil sample that had previously been centrifuged at 3000×g were transferred to a 50 mL polypropylene (PP) centrifuge tube. Ten milliliters of an aqueous extraction solution prepared from 6.7% H2O2 [30%, atomic absorption spectrometry (AAS) grade, Kanto Chemical Co., Inc., Tokyo, Japan], 1% HNO3 [61%, electronic grade (EL), Hayashi Pure Chemical Ind., Ltd. (HPC), Osaka, Japan] and 0.2% HCl (35%, AAS grade, Kanto Chemical Co., Inc.) were added to the oil. The mixture was thoroughly shaken by a vortexer and then placed in an ultrasonic bath (300 W, 26 °C) for 15 min. After undergoing ultrasonic extraction, the mixture was centrifuged at 3000×g and the supernatant-depleted oily phase was meticulously removed by aspiration. The enriched aqueous phase was then transferred to a clean PP tube to which indium (115In) was added as an internal standard at a concentration of 1 µg L−1.

ICP-MS calibration and measurements

External calibration curves were constructed using twelve different mass concentrations including the blank. Standard solutions were generated by diluting a calibration mixture prepared by mixing a multielement solution (XSTC-622B) containing 34 elements (Ag, Al, As, B, Ba, Ca, Cd, Co, Cr, Cs, Cu, Fe, Ga, Ge, K, Li, Mg, Mn, Mo, Na, Ni, P, Pb, Rb, Sb, Se, Si, Sn, Sr, Ti, V, W, Zn and Zr, all at a concentration of 10 mg L−1; SPEXCertiPrep, Metuche, NJ, USA) in a 5.0% w/w nitric acid matrix with a multielement solution (XSTC-1) containing 16 rare-earth elements (Ce, Dy, Er, Eu, Gd, Ho, La, Lu, Nd, Pr, Sm, Sc, Tb, Tm, Yb and Y, all at a concentration of 10 mg L−1; SPEXCertiPrep) in a 5.0% w/w nitric acid matrix and four single-element solutions of Fe, Mg, Ca and Na at concentrations of 1000, 1000, 10,000 and 1000 mg L−1, respectively (SPEXCertiPrep) in a 5.0% w/w nitric acid matrix.

The resultant concentrations of the elements in the calibration mixture were 5500 µg L−1 for Ca and Na, 1500 µg L−1 for Fe and Mg, 500 µg L−1 for the elements in the XSTC-622B, and 2 µg L−1 for the elements in XSTC-1. This calibration mixture was further diluted in an aqueous solution with the same composition as that used to extract the oil samples in order to matrix match the calibration solutions and the sample solutions. Twelve solutions were consequently prepared to calibrate the ICP-MS. The range of concentrations was 1.375–2750 µg L−1 for Ca and Na, 0.375–750 µg L−1 for Fe and Mg, 0.125–125 for the elements in XSTC-622B, and 0.0005–0.5 for the elements in XSTC-1.

The elements were quantified using ICP-MS (NexION 300XX, PerkinElmer, Waltham, MA, USA). Samples were introduced by means of a borosilicate glass nebulizer. ICP-MS is known to suffer from unwanted polyatomic isobaric interferences, so the elements were monitored in kinetic energy discrimination (KED) He collision mode to check for and reduce polyatomic interferences, and the appropriate isotopes were used. Instrument performance was checked via midrange continuing calibration verification (5 µg L−1) every ten samples. Indium was used as an internal standard and added to all samples, calibration solutions, method blanks and solutions prepared for quality control at a concentration of 1 µg L−1. The operating parameters of ICP-MS were as follows: RF power, 1600 W; plasma Ar flow rate, 18 L min−1; auxiliary Ar flow rate, 1.2 L min−1; nebulizer (carrier gas) flow rate, 0.98 L min−1; sampler and skimmer cones of nickel; lens voltage, − 11.5 V; analog stage voltage, − 1800 V; pulse stage voltage, 1100 V; discriminator threshold, 13 V; quadrupole rod offset, − 12 V; pulse detector; speed of peristaltic pump, 20 rpm; 20 sweeps/reading; 3 replicates; dwell time, 50 ms; peak hopping scan mode; rejection parameter a, 0; rejection parameter q, 0.25; and KED mode He gas flow, 3.5 L min−1.

Isotopes: 7Li, 11B, 23Na, 24Mg, 27Al, 31P, 44Ca, 47Ti, 51V, 53Cr, 55Mn, 57Fe, 59Co, 60Ni, 63Cu, 66Zn, 71Ga, 74Ge, 75As,78Se,85Rb, 88Sr, 89Y, 90Zr, 98Mo, 111Cd, 118Sn, 121Sb, 133Cs, 135Ba, 139La, 140Ce, 141Pr, 147Sm, 153Eu, 157Gd, 159Tb, 163Dy, 165Ho, 166Er, 169Tm, 172Yb, 175Lu, 184W, 208Pb.

Soil samples

Soil samples were taken from the surface layer (from 0 to 30 cm deep) and the subsurface layer (from 30 to 60 cm deep) using a hand auger, except for some soil samples from Gafsa (7 samples), Sidi Bouzid (2 samples), Sfax (1 sample) and Kerkennah (1 sample), where only the top layer was sampled. Two layers were sampled to investigate the influence of soil depth on the elemental concentration and because greater olive root density was encountered in the 30–60 layer during sampling. Soil samples were dried at 40 °C to remove moisture and passed through a 2 mm nylon-screen sieve directly after sampling.

Total concentration measurements

Major and trace elements in the soil samples were measured at the facilities of the Geological Survey of Japan (GSJ), part of the National Institute of Advanced Industrial Science and Technology (AIST) in Japan, using a method developed by the GSJ. The method consisted of preparing high-dilution-ratio fused glass beads for the X-ray fluorescence (XRF) determination of major element oxides (Ejima et al. 2018) and using the same beads for trace element measurements by femtosecond ultraviolet laser ablation ICP-MS (LA-ICP-MS) (Kon and Hirata 2015).

XRF analysis. The glass beads were prepared by mixing 0.5 g of the powdered soil sample with 5.0 g of lithium tetraborate flux. The mixture was heated to 1200 °C for 10 min using a semiautomatic fusion device (HAG-M-HF, Herzog, Osnabrück, Germany). Ten major elements in each soil sample (Si, Ti, Al, Fe, Mn, Mg, Ca, Na, K and P) were quantified by an XRF instrument (ZSX Primus III+ , Rigaku Corp., Tokyo, Japan) with an Rh tube. The accuracy was verified each day using two reference samples (JB-1b and JG-3). The analytical uncertainties for each element were better than 1.5%, as estimated from the long-term reproducibility of measurements of the geochemical CRM JB-1b.

LA-ICP-MS analysis. To obtain reliable abundance data for the trace elements in the soil samples, we used a quadrupole ICP-MS system (model 7500cx, Agilent Technologies Japan Ltd., Tokyo, Japan) coupled with a 260 nm (UV) titanium–sapphire femtosecond laser ablation system (IFRIT, Cyber Laser Inc., Tokyo, Japan). The laser ablation was performed using a crater size of 20 μm, a pulse energy of 10 J cm−2, an emission repetition rate of 1000 Hz, and a pulse duration of 150 s. A galvanometric optical scanner was employed to minimize elemental fractionation and to achieve effective ablation of the glass beads at a rastering speed of 10,000 μm s−1 in a rastered area of 400 × 400 μm. To quantify the concentration of each element, GSJ geochemical reference samples (JP-1, JB-1b, JB-2, JB-3, JA-1, JA-2, JA- 3, JR-1, JR-2, JR-3, JGb-1, JGb-2, JG-1a, JG-2 and JG-3) were used to construct calibration curves. The ICP-MS operating conditions and parameters were: RF power, 1600 W; cool gas flow rate, 15 L min−1; auxiliary gas flow rate, 1 L min−1; carrier gas flow rate, 0.8 L min−1; peak jump scanning mode; time-resolved analysis mode; integration time, 240 s/sample; dwell times of 10 ms for Li, 20 ms for V, Cr, Rb, Sr, Ba and Pb, and 40 ms for other elements; sweep time, 1.6 s; detector mode was analog for Li and P/A for other elements. The isotopes measured were: 6Li, 51V, 52Cr, 59Co, 61Ni, 63Cu, 66Zn, 75As, 85Rb, 88Sr, 89Y, 90Zr, 93Nb, 95Mo, 111Cd, 118Sn, 121Sb, 133Cs, 137Ba, 139La, 140Ce, 141Pr, 146Nd, 147Sm, 151Eu, 153Eu, 157Gd, 159Tb, 163Dy, 165Ho, 166Er, 169Tm, 172Yb, 175Lu, 178Hf, 181Ta, 182W, 204Pb, 205Tl, 206Pb, 207Pb, 208Pb, 209Bi, 232Th, 238U.

Bioavailable fraction extraction and measurement

It was important to determine the bioavailable fraction because olive trees cannot extract all of the elements from the surrounding soil—some elements are strongly bound to the soil structure (Intawongse and Dean 2006). Therefore, the 0.01 M CaCl2 single-extraction procedure was used to determine the bioavailable fraction of the elements in soil samples (Houba et al. 2000; Bakircioglu et al. 2011). The 0.01 M CaCl2 extraction solution utilized in this procedure has an ionic strength comparable to that of the soil solution, given that Ca2+ is its main cation. In addition, 0.01 M CaCl2 is unbuffered and so the necessary reactions and interactions occur at the pH of the soil. Also, once it enters the soil suspension, the calcium cation from the CaCl2 provides the advantage of preventing the colloidal material in the soil from coagulating by enhancing aggregate stability (Amoakwah et al. 2013).

One gram of soil in 10 mL of the 0.01 M CaCl2 mixture was shaken for 2 h using a mechanical shaker at room temperature. After that, the mixture was centrifuged at 1800×g to separate the solid and liquid phases. Consequently, 15 mL of the supernatant were carefully collected, transferred to a PP tube, and acidified with 0.71 mL of 20% HNO3 (yielding 1% HNO3 in the final sample solution volume to matrix match the calibration solutions) to preserve the extract and prevent the adsorption of ions on the tube surface and bacterial growth until the metals were determined by ICP-MS analysis at the facilities of GSJ. Indium was added as an internal standard at a concentration of 1 µg L−1. The ICP-MS instrument was calibrated by mixing the same multielement and single-element standard calibration solutions (XSTC-622B, XSTC-1, Ca, Na, Fe and Mg; SPEXCertiPrep) in 5.0% w/w nitric acid and then diluting this mixture to achieve analyte concentrations similar to those expected to occur in the extracts. The calibration ranges were as follows: 0.05–5000 µg L−1 for Ca, 0.02–2000 µg L−1 for Fe, 0.005–500 µg L−1 for Mg, 0.002–200 µg L−1 for Na, 0.0002–20 µg L−1 for the elements in XSTC-622B, and 0.00002–2 µg L−1 for the elements in XSTC-1.

Isotopes: 23Na, 24Mg, 27Al, 45Sc, 47Ti, 51V, 53Cr, 55Mn, 57Fe, 59Co, 60Ni, 63Cu, 66Zn, 69Ga, 72Ge, 75As, 85Rb, 88Sr, 89Y, 90Zr, 95Mo, 118Sn, 121Sb, 133Cs, 137Ba, 139La, 140Ce, 141Pr, 146Nd, 147Sm, 151Eu, 153Eu, 157Gd, 159Tb, 163Dy, 165Ho, 166Er, 169Tm, 172Yb, 175Lu, 182W, 208Pb.

Chemical characterization of the soil samples

pH and electrical conductivity (EC). The pH and EC were measured in the same suspension of 1:5 soil/distilled water as described by Rayment and Higginson (2002). The mixture was shaken for 1 h in a mechanical shaker at room temperature. The suspension was allowed to rest for 1 h before its pH was measured. The EC was measured after 24 h in the same suspension (Wali et al. 2015).

Organic matter content (OM) and inorganic carbon content (CaCO3). OM and CaCO3 were determined based on the loss on ignition (LOI) procedure. One gram of soil was added to a ceramic crucible, placed in a muffle furnace (FP410, Yamato Scientific Co., Ltd., Tokyo, Japan) at 500 °C for 4 h, cooled to room temperature in a desiccator and weighed. The LOI at 500 °C was calculated by dividing the difference between the mass of the dry soil at 105 °C and the soil mass after calcination by the dry soil mass at 105 °C. The LOI at 1050 °C (2 h hold) was calculated by dividing the difference between the soil mass combusted at 500 °C and the soil mass at 1050 °C by the soil mass at 500 °C (Heiri et al. 2001; Wright et al. 2008).

Data analysis

The Kruskal–Wallis test, Spearman correlation analysis, principal component analysis (PCA) and linear discriminant analysis (LDA) were all performed using the IBM SPSS Statistics software for Windows, version 24.0 (IBM Corp., Armonk, NY, USA). These tests are well described in Damak et al. (2019a).

Results and discussion

Geochemical characterization of the soil as an important terroir component

Influence of soil depth on elemental concentrations

Table 1 shows the total concentrations of various elements in the soil samples. The major elements were analyzed by XRF and the trace and rare-earth elements (REEs) were determined by LA-ICP-MS. To the best of our knowledge, this is the first characterization of the geochemical compositions of agricultural soils from many different Tunisian regions to be reported in the literature.

Table 1 Element contents, pH and EC in soil samples from various locations, regardless of the sample depth

The normality of the data was tested using the Shapiro–Wilk test, which indicated that the data had a nonnormal distribution. Therefore, analysis of variance was based on the nonparametric test. The nonparametric Kruskal–Wallis test showed that, for each element, there was no significant difference (p < 0.05) between the concentrations of the same element in the two soil layers. Taking into account that most of the sampled soils were not tilled (tillage helps to mix the different soil layers), this result may indicate a lack of contamination of the surface soil layer, which is usually enriched with heavy elements resulting from the application of chemicals to agricultural soils (Shen et al. 2013). Indeed, many international and national governmental reports have stated that Tunisia has the largest surface area of organic olive groves in the world (> 245,000 ha; Lernoud and Willer 2019), and chemicals are used sparingly or not at all during the cultivation of olive trees in Tunisia (Ben Khedher 2012; Laroussi-Mezghani et al. 2015). The absence of chemical contamination of Tunisan soils is supported by data reported in the literature on the background levels of elements in soils. Background levels of the heavy elements As, Cr, Cu, Fe, Mn, Ni, Pb, Zn and Cd, which may accumulate as a result of anthropogenic activities, are reported to be 9.47, 61.09, 8.37, 20,696, 555, 31.72, 21.05 and 52.74 mg kg−1, respectively (Halim et al. 2014), which are very close to the concentrations of these elements in the samples collected in this study (with a few exceptions).

Influence of origin on element concentrations in soil

Table 1 shows that, for soil samples from all regions, Si had the highest concentration among the major elements, followed by Ca (except in Nefza and Kairouan), Al, Fe, K, Mg, Ti (except in Medenine), Na, P (except in Nefza) and Mn. Zr had the highest concentration among the trace elements, followed by Ba and Sr. Among the rare-earth elements (REEs), the light REEs had higher concentrations than the moderately heavy and the heavy REEs: Ce had the highest concentration, followed by La, Nd, and Pr. In most cases, the concentrations of major, trace and rare-earth elements varied with the region. Indeed, the nonparametric Kruskall–Wallis test (p < 0.05) indicated that when soil samples from different regions were compared in a pairwise manner, all of the elements except for Cd, K and Tl displayed significant differences in concentration between regions in at least two of those pairwise comparisons.

In general, the chemical elements are correlated due to similarities in their physicochemical characteristics and geochemical phenomena (Bazon et al. 2013). Therefore, given the high number of elements, a widely used data reduction method—principal component analysis (PCA) with Varimax rotation—was applied to the dataset composed of 51 variables × 76 cases (here, a “case” corresponds to a sample and a “variable” corresponds to an inorganic element) to simplify data interpretation. The first PCA conducted used an eigenvalue cutoff criterion of > 1 to select the number of generated components. This PCA generated seven principal components (PCs) to replace the original 51 variables, and these seven PCs explained 88.8% of the original variance. When the associated scree plot and the table of the total variance explained were examined, it was found that the first two PCs explained the majority of the original variance (52.2% and 12.4% for PC1 and PC2, respectively). Therefore, the analysis was conducted a second time with forced selection of two PCs. The cumulative variance explained by the two extracted PCs was 73.7%, or nearly three-quarters of the information in the dataset (which initially included 51 variables).

Examination of the variable loading plot (Fig. 2) reveals several interesting features. First, PC1 is dominated by REEs, some heavy elements such as Th, U, Nb, Ta, W, Rb, Ti, Pb, Sn, V, Cr, Al, Fe, Mn, Zn, Cs and Co, as well as the LOI at 500 °C, which represents the organic matter (OM) content of the soil. Second, Ca, the LOI at 1050 °C (which represents the inorganic carbon content of the soil, mostly in the form of CaCO3) and Sr score high on the positive side of PC2 whereas Si has a high loading on the negative side of PC2. Third, Zr and Hf together form a separate group, as do Cd and Tl. Finally, P appears in the plot next to Ba, Mg, Cu, Sb, As, Bi, Mo and Ni.

Fig. 2
figure 2

Loading plot of the original 51 variables on PC1 and PC2. EC electrical conductivity, LOI500 loss on ignition at 500 °C, LOI1050 loss on ignition at 1050 °C

These associations indicate the presence of some geochemical processes. For instance, the associations along PC2 relate to Sr enrichment by substitution of the major Ca cation in carbonate minerals. PC1 also reflects the enrichment of REEs and some heavy elements in soil particles with negatively charged surfaces such as organic matter and Al, Fe and Mn oxides. Due to their pH-dependent surface charge, these particles show high retention and adsorption capacities for lanthanide ions released during the weathering of primary minerals (Papangelakis and Moldoveanu 2014). Zr and Hf are from the same group in the periodic table and thus present similar properties (a valence of 4, an atomic radius of around 0.72), as evidenced by their linear relationship (R2 = 0.9977; Fig. S1 in the Supplementary Information) and confirmed by the Spearman bivariate correlation (cc = 0.992, p < 0.001). As previously mentioned by Chandrajith et al. (2005), Hf occurs mainly in zircon minerals, where it substitutes Zr, and these elements display a strong linear relationship.

Next, in order to understand these associations, the position of each sample in relation to PC1 and PC2 was graphically represented on a score plot (Fig. 3). In the plot, the samples originating from Zaghouan (F12–F18; Table S1 in the Supplementary Information) can be superposed on Ca, CaCO3 (represented by LOI1050) and Sr on PC2. Soil samples from site F12 had the highest PC2 scores. Among all the samples, the concentrations of Ca, Sr and CaCO3 were found to be highest in the F12 samples. However, F14 and F18, which were also from Zaghouan, had low PC2, probably because they had relatively low concentrations of Ca compared to the samples from the other sites. Note that the samples from each group tend to populate the same quadrant of the graph, even though they may not be close to each other.

Fig. 3
figure 3

Score plot of the samples on PC1 and PC2

The samples from sites F37, F14 and F36 had the highest positive PC1 scores. F36 and F37 originate from Nefza and F14 originates from Zaghouan. As we previously mentioned, the variable loading biplot (Fig. 2) shows that PC1 is dominated by REEs and heavy elements such as Th, U, Nb, Ta, W, Rb, Ti, Pb, Sn, V, Cr, Al, Fe, Mn, Zn and Coas well as OM (labelled as LOI500 in the plot). An examination of the median contents of these elements in all 11 olive-growing regions (Table 1) indicated that soil samples from Nefza and Zaghouan had the highest concentrations of all the REEs, Th, U, Al and Fe. The levels of light REEs were particularly high; in particular, among all 11 regions, the abundances of Ce, La and Nd were the highest in Nefza and Zaghouan (Fig. S2 in the Supplementary Information).

The calculated coefficients of determination showed that La, Ce and Nd had strong linear relationships (R2 = 0.9855 and 0.9973, respectively), implying that they showed similar behavior in all types of soils from the 11 regions (Figs. S3 and S4 in the Supplementary Information).

On the other hand, an examination of the concentrations of the major elements (Si, Ca, Fe and Al) revealed three main features: (i) Zaghouan soils had the lowest concentrations of Si and the highest concentrations of Ca; (ii) Al and Fe were most abundant in soils from Nefza; and (iii) the highest Si concentrations were found in Kairouan, Sfax, Sidi Bouzid and Nabeul soils, respectively (Fig. S5 in the Supplementary Information).

Based on our previous observations, and as previously stated by Chen and Yang (2010), the enrichment of REEs in the soils from Nefza can be explained by a high degree of soil development and high leaching losses of Ca, Mg, K and Na (but not Al, Fe and REEs because they are less mobile species). Indeed, Nefza soils had the lowest concentrations of Ca and the highest concentrations of Al, Fe and REEs among all the soil samples, which supports the notion that the soils in that region are highly developed (note that this region also has the greatest precipitation and humidity; see Table S2 in the Supplementary Information).

Moreover, this particularly high abundance of REEs in Nefza soils can also be explained by their particularly high OM contents (Fig. 4a), which, as previously mentioned, is responsible for their high capacity to retain lanthanides. In fact, Spearman correlation coefficients calculated for the OM content and REE concentrations revealed that there were significant positive correlations between these parameters (cc values ranged from 0.755 to 0.866, p < 0.001). This finding agrees with that reported by De Sá Paye et al. (2016) regarding positive correlations of REE levels in Brazilian soils with the soil OM content. The relative enrichment observed in the Zaghouan soils is probably due to the alkaline barrier at the soil horizon near the bedrock, which causes REE cations to precipitate (Reinhardt et al. 2018). This alkaline barrier is due to carbonate dissolution, which provides a neutral to alkaline buffer. The existence of this alkaline barrier is demonstrated by the fact that most of the sampling sites in Zaghouan are located above an early Cretaceous marl, alternating marl–limestone and sandstone formation (Table S1 in the Supplementary Information). Furthermore, we obtained a soil profile (Fig. S6 in the Supplementary Information) at F18 in Zaghouan that had developed above Mio-Pliocene continental sediments made of calcareous and dolomitic conglomerates and sandy clays, and which exhibited three horizons: A, Bk1 and Bk2. The lowercase k in these horizon labels indicates that there is an accumulation of carbonates in the B horizon. Reacting the soil carbonate in situ with 10% HCl showed that the upper two horizons (down to a depth of 30 cm) are strongly calcareous (10% > CaCO3 > 25%). The lower horizon (30–65 cm deep) is extremely calcareous (CaCO3 > 25%). In addition, the highest laboratory-determined CaCO3 contents were found in soils from the Zaghouan region (Fig. 4b). These carbonates are either lithogenic carbonates inherited from calcareous parent materials or pedogenic carbonates resulting from a neoformation process (Pal et al. 2000).

Fig. 4
figure 4

a Organic matter (OM) and b inorganic carbon (CaCO3) contents in soil samples from 11 Tunisian regions. Values are in %, and the median values as well as the associated standard deviations are shown

To summarize, both of the positive quadrants of PC1 are dominated by samples from Nefza and Zaghouan, which had high REE levels and OM contents, whereas the samples in the negative quadrants of PC1 are much more spread out and it is much more difficult to interpret their distribution. However, it is possible to draw some conclusions about some samples. For example, F8, F10 and F11 in Gasserine and F19–F22 and F26 in Nabeul showed the highest concentrations of some REEs. These findings confirm that PC1 reflects the REE abundances in the samples.

Both of the positive quadrants of PC2 are dominated by soil samples from Zaghouan, Kerkennah, Gasserine and Gafsa, as well as F50 from Sfax. After examining their geological settings and their Ca, Sr or CaCO3 concentrations, we noticed that these sites presented high values of at least one of those variables or a limestone lithology. The negative quadrants of PC2 are dominated by samples from Nabeul, Sfax, Kairouan, Nefza and Medenine that have low contents of Ca, Sr or CaCO3. These findings further confirm that PC2 reflects the carbonate content and Sr enrichment in the samples.

Elemental characterization of olive oils

Table 2 shows the median contents (and the associated standard deviations) of elements in the olive oil samples from the 11 regions. Li, V, Cr, Co, Ga, Sn and Sb were not detected in any of the 86 analyzed samples. Al, Ge, Y, Zr, Cs and Sm were each detected in only one sample. Yb was detected in only two samples. As, Se, Cd, Gd, Tb, Er, Tm and Pb were each detected in less than nine samples. Therefore, none of these elements are considered here and they were excluded from the statistical analyses (Camin et al. 2010). Our analytical results are most similar to those of Camin et al. (2010) and Beltrán et al. (2015). Aside from linking olive oils to their geographical origins, the elemental compositions of olive oils can provide insights into environmental contamination of the olives during growth or contamination of the olives during processing. Results showed that the contents of selected heavy elements in the 86 olive oils from 11 Tunisian olive-producing regions were either undetectable (e.g. V, Cr, Co, Sn, Sb, As, Cd and Pb) or far below the maximum allowable limits established by the International Olive Oil Council (IOC 2016): Fe < 3 mg L−1; Cu, Pb and As < 0.1 mg L−1). This finding agrees with our previous results for olive oils collected in December 2015 from four governorates (Sfax, Monastir, Medenine and Gafsa) (Damak et al. 2019a), and again highlights the excellent quality of Tunisian olive oils regarding heavy metal concentrations. The elemental data for the 86 olive oil samples were characterized by a nonnormal distribution according to the Shapiro–Wilk test (p < 0.05). Therefore, the Kruskall–Wallis test was used to analyze the distributions of the remaining 22 elements across the 11 regions of interest. Regardless of the underlying causes, there were found to be large differences in elemental composition between the Tunisian olive oils. When oils from different regions were compared in a pairwise manner, 13 of the elements (B, Na, Mg, P, Fe, Cu, Zn, Rb, Sr, Mo, Ba, La and Ce) displayed significant differences in concentration (p < 0.05) between regions in at least two of those pairwise comparisons. Concentrations decreased in the following order: Mg, Na, P, Ca, Fe, Ti and Mn for major elements; B, Ba, Rb, Sr, Zn, Mo, Cu and Ni for trace elements; and Eu, Lu, Pr, La, Ce, Ho and Dy for REEs.

Table 2 Concentrations of major, trace and rare-earth elements in olive oil samples from 11 Tunisian geographical origins

Oils from the Sahel region had the highest concentrations of Mg and Fe, while the lowest abundances of these elements were recorded in oils from Nefza and Zaghouan, respectively. Na was most concentrated in oils originating from Nabeul, followed by those from Kerkennah and Tataouine, whereas the lowest concentrations of Na occurred in oils from Beja and Zaghouan. The highest concentrations of Ca were found in oils from Medenine, followed by those from Sahel, while the lowest concentrations were found in oils from Nefza and Zaghouan. Mn was not detected in many of the oils from most of the regions, and it presented very similar concentrations in Sahel and Medenine oils. These results for major elements show that the elemental profiles for the soil samples contrast somewhat with those for the olive oils. Indeed, our previous study demonstrated that the total concentration of Zn in soil and oil was inversely correlated, as was the total concentration of Pb (Damak et al. 2019a). It was therefore important to check this finding by analyzing a larger dataset and to search for plausible explanations for this phenomenon. These aims are addressed in the following sections.

Relationships between olive oil multielemental composition and terroir components (soil and climate)

The concept of multielement-based geographical traceability is deeply rooted in the hypothesis that the mineral content of the soil is the main influence on the elemental composition of the olives grown in that soil, and thus the elemental composition of the olive oil derived from those olives. However, information on how studying the elemental profile of an olive oil makes it possible to trace its geographical origin is scarce. Consequently, in order to prove the validity of multielements as geographical markers, we need to study the correlation between the elemental profile of the olive oil and that of the soil, as well as the effects of other environmental factors in the provenance of the oil that could affect its elemental profile, such as climatic and edaphic characteristics. In other words, the influence of biogeochemical processes on the multielemental composition of the oil must be quantified (Greenough et al. 2010). Below, we first quantify the effect of soil chemistry and then examine the influence of the climate on the elemental profiles of olive oils.

Correlations between the total elemental concentrations in soils and those in olive oils

Since the elemental data were not characterized by a normal distribution, the Spearman correlation coefficient was calculated to quantitatively estimate the strength of the relationship between the total content of a particular element in the soil and its concentration in olive oil. All the significant correlations obtained were found to be negative, including that for Zn (cc = − 0.23, p = 0.037), which confirmed our previous findings. These elements with negative correlations were Fe (cc = − 0.313, p = 0.004), La (cc = − 0.22, p = 0.046), Ce (cc = − 0.361, p = 0.001) and Pr (cc = − 0.231, p = 0.035). The other elements displayed weak (i.e. not statistically significant) positive (Na, Ti, Ni, Eu, Dy and Ho) or negative (Mg, P, Ca, Mn, Cu, Rb, Sr, Mo, Ba and Lu) correlations. This finding confirms our previous results regarding the significant negative correlations between the total contents of some elements in soil and the contents of those elements in olive oil, which led us to wonder about their underlying causes. In fact, for some elements, both negative and positive correlations between the contents of the element in soil and olive oil have been reported in the literature. It has been stated that a negative correlation indicates that the form of the element in question is not available to plants because the chemical conditions in soil can affect the pH and the solubilities of elements in the soil (Tyler and Olsson 2001; Ogidi et al. 2018).

Correlations with the bioavailable contents of elements in soil and the influence of soil pH

Table 3 shows the median bioavailable concentrations (and the associated standard deviations) of the elements of interest in all of the soil samples. Spearman correlation coefficients calculated upon comparing the bioavailable concentration of each element in soil to the content of that element in olive oil revealed that, for each of the following elements, there was a significant positive correlation between their contents in the soils and olive oils: Ti (cc = 0.221, p = 0.04), Fe (cc = 0.284, p = 0.01), Ni (cc = 0.261, p = 0.02) and Ba (cc = 0.240, p = 0.031) (Fig. 5).

Table 3 Bioavailable concentrations of various elements in soils from 11 Tunisian olive-producing regions
Fig. 5
figure 5

Comparison of the median concentration in olive oil to the median bioavailable concentration in soil for four elements (a Ni, b Fe, c Ti and d Ba) that displayed significant positive Spearman correlations between these two parameters. The plots show comparisons for each of the 11 Tunisian olive-growing regions. Values are in µg L−1. Bio bioavailable concentration in soil (left-hand y-axis), Oil concentration in olive oil (right-hand y-axis)

As expected, these figures show that the olive oil elemental profile better reflects soil bioavailable elemental concentrations than soil total elemental concentrations. The highest and most significant Spearman correlation coefficient was obtained for Fe. Figure 5 shows that the variation in soil bioavailable Fe across the 11 regions is similar to the variation in Fe content in olive oil. The lowest concentrations in both compartments were observed for Gasserine, Zaghouan, Nefza, Sidi Bouzid and Kairouan, whereas the highest were observed for Medenine, Sahel, Sfax, Nabeul and Gafsa. Since the Spearman correlation probes monotonic relationships, a high concentration in soil does not necessarily imply a high concentration in oil, and vice versa. For instance, Kerkennah, which is an island, had one of the lowest soil bioavailable Fe concentrations but one of the highest Fe concentrations in olive oil, which may indicate that other factors such as climate are also influential. The same applies to Ti in soil and oil from Kerkennah: we observed low soil bioavailable Ti concentrations but high Ti concentrations in olive oil for this region. However, overall, the relationship between the concentration of an element in the soil bioavailable fraction and its concentration in olive oil was clearly monotonic. These significant positive correlations prove that the composition of the soil has a strong influence on the elemental profile of olive oil. Such correlations have not been studied nor demonstrated in previous studies to the best of our knowledge. There was also no significant negative Spearman correlation between the soil bioavailable concentration of any element and its concentration in olive oil.

Nonsignificant negative and positive correlation coefficients were found when the relationship between the soil bioavailable content and the total content in oil was examined for other elements (Table 4): Mg, Mn, Rb, Sr, La, Eu and Dy displayed positive correlations whereas Na, Cu, Zn, Mo, Ce, Pr, Ho and Lu displayed negative correlations.

Table 4 Values of the Spearman correlation coefficient for the relationship between the soil bioavailable concentration of an element and its total concentration in olive oil, based on oils and soil samples from 11 Tunisian olive-growing regions

To summarize, and regardless of the significance of the correlation, Ti, Ni, Eu and Dy displayed positive correlation coefficients when the total and the bioavailable contents in the soil were considered; Zn, Cu, Mo, Ce, Pr and Lu displayed negative correlations when the total and the bioavailable contents in the soil were considered; the correlations for Fe, Ba, Mg, Mn, Rb, Sr and La changed from negative to positive when the soil bioavailable content was considered rather than the total content in the soil; and finally, the correlations for Na and Ho changed from positive to negative when the soil bioavailable content was considered rather than the total content in the soil. These findings indicate that elements which displayed a negative correlation when the soil bioavailable content was considered were not available to plants, even when the CaCl2-soluble form was used as a model for the bioavailable concentration. This could be because CaCl2-soluble forms are not suitable for use as bioavailable models or because the soil chemical conditions affect the soil pH and the solubilities of elements, especially given that CaCl2 is unbuffered so it does not modify the original pH of the soil (Tyler and Olsson 2001; Ogidi et al. 2018).

A number of authors have observed various interactions where element uptake by plants is influenced by interference and competition between elements due to changes in soil pH (Ogidi et al. 2018). In fact, all of the significant (p < 0.05) correlations between element concentrations in olive oil and soil pH were positive, implying that when the pH increases, the concentrations of these elements in olive oil increase as well. Such correlations were seen for nine elements: Na (cc = 0.428, p = 0.00008), P (cc = 0.270, p = 0.016), Ca (cc = 0.274, p = 0.015), Cu (cc = 0.325, p = 0.003), Zn (0.259, p = 0.21), Rb (cc = 0.238, p = 0.035), Sr (cc = 0.274, p = 0.015), Ce (cc = 0.295, p = 0.008) and Lu (cc = 0.300, p = 0.007). On the other hand, pH was negatively correlated to the soil bioavailable contents of the elements Cu, Zn, Rb, Sr, Ce and Lu, confirming the negative correlations obtained for Cu, Zn, Ce and Lu between the soil bioavailable concentration and the total concentration in olive oil. Soil total and bioavailable concentrations were negatively correlated to soil pH for the other elements except for total Na, in agreement with the findings of Tyler and Olsson (2001), who stated that less soluble oxides/hydrates and insoluble carbonates can form at high pH. Considering the complexity of the interactions between elements, Spearman correlation coefficients were further calculated for the matrix defined by the elements in olive oil (B, Na, Mg, P, Ca, Ti, Mn, Fe, Ni, Cu, Zn, Rb, Sr, Mo, Ba, La, Ce, Pr, Eu, Dy, Ho and Lu) and the bioavailable elements in soil samples (Na, Mg, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Rb, Sr, Mo, Ba, La, Ce, Pr, Nd, Eu, Dy, Lu, Ho and W). The results showed that there were many monotonic relationships between elements in olive oil and soil bioavailable elements. All of the elements in the olive oils displayed significant Spearman correlations with at least one soil bioavailable element except for Ho. For example, the contents of Mo in olive oils were significantly and positively influenced by the bioavailable contents of Ti, La, Ce, Pr, Nd, Dy, Ho and Lu. However, many other factors may interfere with and alter the elemental profile of an olive oil, as has been proven for other plants and foods (Zeng et al. 2011; Shen et al. 2013; Halim et al. 2014).

The results from the Spearman correlation analyses highlight the elements that can be used to establish a valid link between an olive oil and its provenance—specifically, the elements that showed significant positive correlations between their concentration in olive oil and their soil bioavailable concentration. Therefore, Ti, Fe, Ni and Ba were identified as potential markers that could facilitate the geographical traceability of Tunisian olive oils to their geographical origins.

Correlations with soil chemical characteristics and the climate

It is important to study the indirect effects of soil chemistry and the climate on the multielemental composition of olive oil. Spearman correlation coefficients were calculated to estimate the strengths of the relationships between the element concentrations in olive oil and the soil pH, EC, OM, CaCO3, average annual precipitation (AAP), relative humidity (RH), minimum temperature (MinAT), maximum temperature (MaxAT) and mean temperature (MAT). We only present the results for significant correlations below, element by element.

Mg, Ti, Ni, Mo, Ba, Pr and Eu did not show any significant correlation with the above parameters. B was negatively correlated with OM (cc = − 0.274, p = 0.012) and AAP (cc = − 0.306, p = 0.005) and positively correlated with MAT. Na was positively correlated with pH (cc = 0.428, p = 0.0008), RH (cc = 0.350, p = 0.001), MinAT (cc = 0.529, p < 0.0001) and MAT (cc = 0.545, p < 0.0001) and negatively correlated with OM (cc = − 0.493, p < 0.0001). P was positively correlated with pH (cc = 0.270, p = 0.016), MinAT (cc = 0.344, p = 0.002) and MAT (cc = 0.323, p = 0.003) and negatively correlated with EC (cc = − 0.278, p = 0.013) and OM (cc = − 0.375, p = 0.0004). Ca was positively correlated with pH (cc = 0.274, p = 0.015), MinAT (cc = 0.253, p = 0.022) and MAT (cc = 0.323, p = 0.003) and negatively correlated with EC (cc =− 0.276, p = 0.014) and OM (cc = − 0.362, p = 0.001). Mn was positively correlated with MAT (cc = 0.232, p = 0.036) and negatively correlated with EC (cc = − 0.258, p = 0.021) and OM (cc = − 0.256, p = 0.019). Fe was positively correlated with RH (cc = 0.263, p = 0.017), MinAT (cc = 0.450, p < 0.0001) and MAT (cc = 0.485, p < 0.0001) and negatively correlated with EC (cc = − 0.244, p = 0.030) and OM (cc = − 0.307, p = 0.005). Cu was positively correlated with pH (cc = 0.325, p = 0.003), precipitation (cc = 0.280, p = 0.011), RH (cc = 0.350, p = 0.001), MinAT (cc = 0.340, p = 0.002) and MAT (cc = 0.319, p = 0.004) and negatively correlated with EC (cc = − 0.416, p < 0.0001) and OM (cc = − 0.284, p = 0.009). Zn was positively correlated with pH (cc = 0.259, p = 0.021), RH (cc = 0.280, p = 0.011), MinAT (cc = 0.288, p = 0.009) and MAT (cc = 0.227, p = 0.040) and negatively correlated with EC (cc = − 0.338, p = 0.002) and OM (cc = − 0.268, p = 0.014). Rb was positively correlated with pH (cc = 0.238, p = 0.035), MinAT (cc = 0.298, p = 0.006) and MAT (cc = 0.257, p = 0.020) and negatively correlated with EC (cc = − 0.283, p = 0.012), OM (cc = − 0.375, p < 0.0001) and CaCO3 (cc = − 0.317, p = 0.003). Sr was positively correlated with pH (cc = 0.274, p = 0.015), MinAT (cc = 0.314, p = 0.004) and MAT (cc = 0.423, p < 0.0001) and negatively correlated with OM (cc = − 0.464, p < 0.0001) and precipitation (cc = − 0.350, p = 0.001). La was positively correlated with RH (cc = 0.262, p = 0.017), MinAT (cc = 0.424, p < 0.0001) and MAT (cc = 0.426, p < 0.0001) and negatively correlated with EC (cc = − 0.301, p = 0.007) and OM (cc = − 0.220, p = 0.045). Ce was positively correlated with pH (cc = 0.295, p = 0.008), RH (cc = 0.247, p = 0.025), MinAT (cc = 0.407, p = 0.0001) and MAT (cc = 0.389, p = 0.0003) and negatively correlated with EC (cc = − 0.349, p = 0.002), OM (cc = − 0.314, p = 0.004) and MaxAT (cc = − 0.278, p = 0.011). Dy was positively correlated with precipitation (cc = 0.266, p = 0.041). Ho was positively correlated with precipitation (cc = 0.227, p = 0.04). Finally, Lu was positively correlated with pH (cc = 0.300, p = 0.007).

Our findings undoubtedly show the complex and significant links between the multielemental composition of olive oil and the environment of the olive trees. These relationships prove that the elemental profile of olive oil is indeed a valid geographical traceability marker that is closely related to the terroir. Shen et al. (2013) obtained similar results to ours for the correlation between soil pH and the elemental profile of rice, such as positive correlations of pH with Ca, Fe, Cu and Zn and a negative correlation between soil pH and Ni. Tyler and Olsson (2001) also found positive correlations of Zn and Cu and a negative correlation of Ni with soil pH. As for the influence of the climate on olive oil minerals, the most noticeable result is that the elements were significantly and positively correlated with the MAT, except for Mg, Dy and Ho (which were nonsignificantly negatively correlated with the MAT) and Ti, Ni, Mo, Ba, Pr, Eu and Lu (which were nonsignificantly but positively correlated with the MAT). The significant positive correlations indicate that the element concentrations increase with temperature and vice versa. A significant positive correlation between temperature (i.e. heat) and the composition of a biosphere component (olive oil in our study) was also demonstrated in a study of the effect of the climate on leaf mineral levels in five functional groups of vegetation in China (Han et al. 2011), as well as in other studies into the effect of growing degree days (GDD; the amount of heat received in a specific region) on the mineral content of wine. The latter result has been interpreted as implying that the higher the temperature, the higher the rate of evaporation from and thus water uptake by vines, leading to higher concentrations of elements in grapes (Greenough et al. 2005).

Exhaustive interpretations of all of the relations listed earlier in this section is beyond the scope of this work, which seeks to prove the existence of a link between olive oil multielemental composition and environmental characteristics of the region of origin of the oil, in order to prove that a multielement-based approach to olive oil traceability is valid. While we have shown that the elemental profile of olive oil is undoubtedly related to provenance factors for the olives used to make the oil, the relations between the elements and provenance factors are too complex and interlinked to be strongly linear.

To summarize, geochemical characterization of soil samples from the 11 main Tunisian olive-producing regions showed that soils from Zaghouan and Nefza have very distinctive chemical compositions compared to soils from the other regions. Although we could not find a strong correspondence between the total element concentrations in soil and those in olive oil, there was a stronger correspondence of the total element concentrations in olive oil with element concentrations in the bioavailable fraction of the soil. This link appears to be due to the effects of ancillary factors related to climate and soil chemistry, which influence the availability of elements in the soil to plants. The next step was to test the effectiveness of the elemental profile when it is used in traceability models to classify olive oils according to their geographical origins.

Applicability of the olive oil multielemental composition in geographical traceability models for olive oils

The geographical traceability of olive oils relies on the transfer of chemical species from the soil to the olive oil. Thus, before we can classify olive oils in this manner, it is necessary to prove that soils can be classified based on their element concentrations (Beltrán et al. 2015). For this reason, we aimed to test the performance of a multielement-based method for predicting the geographical origin of soils.

Stepwise LDA (SLDA) was applied to the 51 variables (i.e. the 51 organic elements) and the 76 soil samples grouped into 11 origin classes. SLDA extracted nine canonical functions, the first two of which explained 93.3% of the total variance in the original dataset. SLDA also selected 21 of the 51 variables as the most discriminatory elements: Co, Mg, Si, Sr, K, Ti, Rb, U, Nd, Pb, Mn, Ca, Ta, W, Hf, Er, Ni, Tb, Al, Zn and La. The distribution of soil samples on the scatterplot of discriminant function 1 versus discriminant function 2 is shown in Fig. 6a. SLDA achieved correct classification rates during training and validation of 97.5% and 77.5%, respectively. The classification results can be seen in Table S3 of the Supplementary Information. Most of the misclassifications were due to samples from Gafsa being wrongly assigned to Sfax, Gasserine and Sahel. Sidi Bouzid samples were also mistakenly attributed to Sfax and Kerkennah. Sfax samples were misclassified as originating from Gafsa, Sidi Bouzid and Kerkennah. Sahel samples were mistakenly assigned to Sfax, and Kerkennah samples were incorrectly attributed to Sidi Bouzid. Applying SLDA to the bioavailable concentrations (Fig. 6b) gave similar results and achieved correct classification rates during calibration and validation of 77% and 67%, respectively.

Fig. 6
figure 6

SLDA analysis of element concentrations in Tunisian soil samples: a total concentrations; b bioavailable concentrations. The figure shows scatterplots of the first two canonical functions using the 11 origin groups as input

These misclassifications were undoubtedly due to the similarity of the multielemental compositions of samples from the relevant groups. In fact, only soil samples from Zaghouan and Nefza could be clearly distinguished, as can be seen from the SLDA plot. In addition, Table S3 in the Supplementary Information shows that soils from Zaghouan, Nefza and Kairouan were discriminated with 100% accuracy. Based on the bioavailable concentrations, Nefza and Zaghouan soils were also predicted with 100% and 91% accuracy, respectively. These high validation rates confirm that the soil samples from these regions had distinct chemical compositions, as discussed in the previous paragraphs.

The task of simultaneously classifying olive oil samples from 11 different regions based on their elemental compositions would be expected to be a very difficult one, considering the previous results obtained with soil samples. While 48 of the 51 variables showed significant variations among the soil samples from the 11 regions, with clear differences in element concentrations between the regions (especially for Zaghouan and Nefza), when oils from different regions were compared in a pairwise manner, only 13 of the elements showed significant differences in at least two pairwise comparisons, and the differences were less distinct than for the soil samples. In fact, the Tunisian olive-producing regions form a contiguous area, especially in the central part of the country, where most olive oil production takes place. A first attempt to classify olive oil samples according to their governorates of origin was conducted using a SLDA with the least strict conditions (Wilk’s lambda with an F to enter of 1). The SLDA showed that ten functions were automatically fitted, but the Wilk’s lambda test of functions revealed that only the first three functions contributed significantly to the model and had eigenvalues that were greater than 1. The first three functions accounted for 68.2% of the variance explained by the SLDA model, but the first two functions accounted for only 53.4% of the total variance. This is why the bivariate plot of F1 versus F2 (Fig. 7a) may not accurately reflect the classification performance achieved by the model. The SLDA was able to correctly classify 74% and 41% of the olive oil samples during training and cross-validation, respectively (Table S4 in the Supplementary Information). It is clear that samples from different groups overlapped in multielemental composition to some extent, meaning that they could not be accurately separated (this was even the case for samples from Zaghouan and Nefza: the corresponding validation accuracies were 54.5% and 0%, respectively). This was expected, as only 13 elements showed a significant difference between at least two origins, and because the high number of classes makes the classification task very challenging. SLDA selected 18 of the 51 elements to use in the classification model: Na, Mg, Rb, Sr, Cu, Ho, Ba, Ti, Ni, P, B, Ca, La, Mn, Fe, Dy, Eu and Zn.

Fig. 7
figure 7

Scatterplots of the first two canonical discriminant functions from the SLDA of Tunisian olive oil samples: a obtained using all 11 origin groups as input; b obtained using three origin groups (Nefza, Zaghouan, and a group combining the remaining regions) as input

In order to make use of the previous results for the soil samples, which showed that Zaghouan and Nefza have distinct soil compositions, three groups were used as input in the subsequent SLDA analysis (F to enter = 1): Nefza, Zaghouan, and all the rest of the regions combined (the “other TN” group: Sfax, Sahel, Kerkennah, Sidi Bouzid, Kairouan, Gafsa, Gasserine, Nabeul and Medenine; see Fig. 7b). This SLDA extracted two functions that had significant Wilk’s lambda values, although only the first function had an eigenvalue above 1. These two functions accounted for 100% of the total variance explained by the model. The correct classification rates during training and prediction were 95.2% and 86.7%, respectively, and improved upon decreasing the number of groups inputted to the model. This improvement can also be seen in the scatterplot of the first two discriminant functions (Fig. 7b). The model that considered three groups performed well when it was used to predict the origins of olive oil samples from other Tunisian regions (92.8% correct classification) and from Nefza (66.7%; this was 0% when 11 groups were considered in the model), but gave only mediocre performance for olive oil samples from Zaghouan (54.5%; this percentage was similar to that obtained when 11 groups were considered) (see Table S5 in the Supplementary Information). The variables that were best for discriminating between olive oil samples based on geographical origin in the SLDA were Ni, Sr, Ti, Mg, Eu, Ho, B, Ba and Na.

These results show that the elemental profile of the olive oil is a promising marker for authenticating the geographical origins of olive oils from Zaghouan and (especially) Nefza. A study that includes a larger set of samples from these two regions is needed to confirm these findings, which could help to valorize Nefza and Zaghouan olive terroirs and protect their production. As an aside, Tunisian olive oils from the northern regions, such as Zaghouan and Nefza, have been proven to have exceptionally high polyphenol contents—as high as 10–20 times the polyphenol contents in European olive oils (JICA 2014).

Conclusion

Our study is, to the best of our knowledge, the first to exhaustively report the chemistry of numerous (including lesser known) mineral elements in the agricultural soils and olive oils from many regions in Tunisia.

The concentrations of the bioavailable elements Ti, Fe, Ni and Ba in soils from a particular region were proven to be significantly positively correlated with the concentrations of those elements in olive oils originating from that region. However, soil chemistry and the climate both directly and indirectly interact with olive trees, affecting their uptake of chemical elements, so these factors play significant roles in shaping the mineral composition of olive oil, as proven by the numerous significant correlations between all of these factors. In fact, the relationships between terroir factors and olive oil multielemental composition prove that olive composition is closely linked to the growing environment of the olives, indicating that the multielemental composition of an olive oil is a powerful marker of its geographical origin. However, the effects of these factors (as well as the effects of interactions between the factors) on the concentration of each element in olive oil remain difficult to quantify. An absence of evidence for strong linear relationships that would allow the development of regression models is what makes this task so challenging. Predicting the concentration of each element at each geographical location based on soil and climate factors is currently even more difficult to achieve. However, our study lays the foundations for further quantitative investigations aimed at deepening our understanding of the effects of biogeochemical drivers on the geographical distribution of the elements.

As for the performance of the multielement-based model in facilitating the geographical traceability of olive oils in the geological and climatic context of Tunisia, our results show that for a very complex classification task in which olive oils were classified into the 11 studied Tunisian regions, the combined use of the elemental profile and chemometrics may not always be successful. This is due to the lithological settings of most of the olive-producing regions of Tunisia (especially those located above the Quaternary alluvial sediments), except for soils from Zaghouan and Nefza. Therefore, the SLDA gave very satisfactory results when it was used to predict the origins of soil samples from the latter two regions. However, the results from the SLDA of olive oil data were not as satisfactory, and even olive oils from Nefza and Zaghouan could not be clearly distinguished from olive oils originating elsewhere in Tunisia. The discrepancies in the geochemical signatures and therefore the unsatisfactory SLDA results for olive oil discrimination were undoubtedly caused by the complex relationships between the climate and the soil chemical characteristics. Nevertheless, when SLDA was conducted to categorize the olive oils into just three origin groups (Nefza, Zaghouan and a combined group comprising the other regions), results improved, and the correct classification rates for the olive oils from Nefza and Zaghouan were 67% and 55%, respectively. This improvement occurred because the classification problem became easier when only three groups were considered instead of 11. Consequently, chemometric modelling of olive oil data is more effective when only a limited number of regions of origin are considered. A combined methodology based on both the multielemental composition and chemometrics is therefore a powerful and very useful means to discriminate olive oils from Nefza and Zaghouan from olive oils originating elsewhere in Tunisia. This approach could help to establish GI labels for olive oils from Nefza and Zaghouan, as it highlights the unique characteristics of the terroir components for the olive oils (namely the soil geochemical and edaphic characteristics) and the strong links between the elements in the olive oils and their provenance factors, and demonstrates that it is possible to accurately discriminate olive oils from these regions. Establishing GI labels for these olive oils should be beneficial in terms of the resulting economic benefits and because it would be expected to increase the competitiveness and status of Tunisian olive oils in the international market. Thus, a study with a higher sampling density that covers nearly 100% of the olive farms in the Nefza and Zaghouan regions and includes samples taken during at least two harvesting seasons (to investigate the possible effect of harvesting year on the multielement marker) is needed to move forward with this strategy.