Introduction

Wine is a very complex matrix containing polyphenols, sugars, tannins, minerals, vitamins, organic acids, and so-called flavoring substances, i.e., compounds containing in their structure an ester or aldehyde group, thanks to which each wine has its own specific characteristics [1]. Terroir is a viticulture concept of high interest among scientists. According to this concept the sensory attributes, and thus the uniqueness of wine, are strictly related to the environmental conditions under which grapes are cultivated. Studying terroir is a highly challenging task due to the number of variables involved, e.g., climate conditions, soil type, cultivar, and human practices [2].

The geographic origin, and thus the soil on which grapes are growing, is one of the most determinative factors in terms of wine quality. Specific types of soil, approximated as stone, gravel, sand, and clay differ in permeability, thermal stability, moisture maintenance soil fertility, and other parameters. Clay and loam content have a decisive influence on the sorption and water capacity of the soil. Calcium content (e.g., CaCO3) has an impact on the so-called soil crumply structure, being advantageous in keeping the balance between soil permeability, the appropriate level of moisture, and the penetration of water into deeper layers [3]. It has previously been shown that these parameters can be used as inputs in multivariate statistical analysis-based classification of wine, and that ratios of chemical features are more useful in this regard than individual parameters [4,5,6].

Grapevine is not a demanding plant; nevertheless, it will not root in too moist, too shallow, heavy, or compact areas. The roots should have access to at least 75 cm of soil to protect against freezing of the roots in winter. Moreover, to provide stable water balance and thus wholesome healthy fruits with a high concentration of sugars and aromas, soil should be moderately moist and permeable with good drainage and fertility [7]. This is why the investigation of the soil composition and effect on the grapes is of interest to both scientists and wine growers.

The aim of the study was to compare the parameters characterizing wines, such as pH, conductivity, total organic carbon (TOC), and concentration of anions (F, Cl, NO2, NO3, PO43−, SO42−). For the first time, the determinations were performed for samples of red and white wines from seven Polish vineyards. In addition, chemometric analysis was performed to investigate certain correlations between the determined parameters. The results were compared in terms of the effect of the soil type on the concentration of compounds contained in wine and the pH and conductivity parameters. Machine learning was employed to assess the possibility of classifying the wine samples based on the soil type.

Results and discussion

Results obtained during tests are summarized in Tables 1 and 2. The pH values for red wines ranged from 3.15 to 3.96, which is a wider range than in the case of white wines (pH 2.78–2.93, except for the sample labeled 19 W the pH of which was 3.14). Higher pH (reduction of acidity) may be attributed to the fact that the soil was fertilized with CaCO3 in order to neutralize the acidity. White wines are more acidic than red wines.

Table 1 Information on analytes determined for the given wine samples (n = 3)
Table 2 Content of determined ions in the wine samples (n = 3)

The conductivity of the solution stood at app. 220 mS cm−1, ranging from 197 to 252 mS cm−1. The conductivity value was higher for white wines; the highest value (252.67 mS cm−1) was recorded for the 17 W sample. The range of conductivity values for white wines was narrower than the range of conductivity values obtained for red wines.

The highest content of organic carbon was noted for the 14R red wine sample. It was a sample of the Heridian wine, produced from grapes growing in the “Stok” vineyard. This result is a consequence of the improvement of the sandy soil, which had not been suitable for cultivation of grapes, through frequent irrigation and fertilization. The average value of TOC content for red wines is 62.51 g dm−3, while for white wines it is 53.68 g dm−3. The obtained values range is smaller for red wines compared to white wines. It should be noted that both the highest TOC value and the smallest range of results are observed for red wines which may indicate greater stability of this parameter for the studied wine group.

The lowest amount of fluorine ions was observed for wines produced from grapes growing on loam soils, while the highest—on sandy soils. However, there were exceptions: the lowest concentration of fluoride ions was found in the sample of wine from grapes grown on sandy soils (sample 25 W). Once again sample 14R should be noted for the highest concentration of these anions (977.6 mg dm−3) in the studied group of wines. When grouping the samples based on wine color, red wines are characterized by on average three times higher concentration of fluorine. The concentration of chlorine ions in wines is very low, though higher for red wines.

The content of both nitrates (V) and nitrates (III) in white wines was below the limit of quantification. In red wine the concentration of both ions was slightly higher. It contains much more nitrates than nitrites. Of the studied group, one-third of samples were characterized by a concentration of nitrates too low to be determined. The concentration of nitrates was determined in 67% of samples. The highest concentration was recorded for the 16R sample (95.2 mg dm−3). In this group there are six wine samples in which nitrates (III) and nitrates (V) could be determined, among them sample 14R, in which the nitrate concentration was also one of the highest.

Red wines produced from grapevines growing on clay soil are characterized by a high content of phosphate ions. The concentration is much lower in white wine. Apart from this conclusion, no more correlations were observed.

The level of sulfur dioxide in wine should not exceed 260 mg dm−3 according to the now outdated PN-A-79122 standard. Of the tested samples, as many as seven exceed this concentration (six red wines and one white). The concentration of these ions for samples 7R and 13R is disturbing, as it exceeds the norm by 378.8 mg dm−3 and 350.8 mg dm−3, respectively. Red wines are characterized by higher concentrations of sulfur ions, caused by the addition of potassium metabisulfite so as to protect the product from contamination with wild yeast and the proliferation of fungi and bacteria in the solution.

The analysis performed using the ion chromatography technique indicated the presence of high concentrations of phosphate, sulfur, and fluoride ions in the wines. The concentration of nitrates (V) and nitrates (III) for most of the samples was below the limit of quantification.

Determination of the correlation between given parameters: chemometric analysis

Due to the ambiguity of the results, chemometric analysis was carried out using the Orange v. 3.13 Python toolkit [8] in order to investigate the interactions of the determined parameters. The results of instrumental analysis were used as input data for multivariate statistical analysis regarding soil type. In the first step, the independent variables were normalized through centering by average and scaling by standard deviation. Next, the variables with the highest impact on classification were selected based on the analysis of variance (ANOVA). The selected variables were used for principal component analysis (PCA) and for supervised machine learning. Variance analysis and PCA were carried out to reduce the ratio of the number of independent variables (the dimensionality of the data set) to the number of measurements, and consequently to reduce the likelihood of accidental correlation (the so-called “Voodoo” correlation) [9]. Based on tenfold stratified cross-validation, the most suitable machine learning algorithm was selected from the following methods:

  • Support vector machines (SVM),

  • k-nearest neighbors (K-NN),

  • Naive Bayes,

  • Random Forest.

The selected model was validated by using randomly selected 66% of the data set for training, and the remaining 34% for testing.

Analysis of variance showed that the highest impact on the classification of samples in terms of soil type can be observed for conductivity, pH, Cl, F and TOC, in this order. The scatter plot of the two first principal components resulting from the PCA is shown in Fig. 1. These two first PCs explained over 77% of the total variance, and the four first PCs contained more than 99% of the total variance. Listed in Table 3 are the accuracy parameters for classification of samples in based on soil type using four different machine learning algorithms.

Fig. 1
figure 1

The result of the principal component analysis of soil parameters, with samples of red wine marked in red and white wine marked in green. The first principal component (PC1) contains 59% of the total variance and the second—18% of the total variance

Table 3 Evaluation of the performance of machine learning algorithms in classification of wine samples in terms of soil type; area under the ROC curve (AUC), classification accuracy, and precision

Based on the cross-validation it was determined that in the considered scenario the best classification accuracy can be achieved using the Random Forest method. The result of the validation of the Random Forest method with separate training and testing data sets yielded satisfactory results, with only a single sample of wine from grapes grown on loamy soil misclassified as a sample of wine from grapes grown on sandy soil. In summary, the results obtained from the chemometric analysis of the obtained sets of measurement data showed the possibility of wine classification in terms of soil type using machine learning techniques. This means that taking into account parameters such as pH, concentrations of Cl, F and the value of TOC, it is possible to determine the type of soil on which the vines were grown. This might lead to the development of cost-effective tools for authentication of wines, especially since the determination of the parameters characterized with the highest variance with respect to the soil type, i.e., conductivity and pH, does not require the use of sophisticated analytical equipment. Furthermore, the knowledge of the relationship between the soil type and composition variables might be important for manufacturers and planters when selecting particular grape varieties tailored to the particular soil conditions in the vineyard.

Conclusion

During the research, analyses of 25 wine samples from seven vineyards located in different parts of Poland were performed. Parameters such as pH, conductivity, concentrations of ions Cl, F, NO2, NO3, PO43−, SO42− (IC), and total organic carbon content (TOC) were determined. Finally, the obtained results were subjected to chemometric analysis. From the tested samples, red wines were characterized by higher acidity and higher concentration of fluoride ions. Phosphate ions are prevalent in wine. The analysis showed that the soil on which the grapevines are grown has a significant impact on the values of selected parameters and on the content of selected chemical compounds. In addition, it showed the possibility of assigning a soil type based on such parameters as pH, concentration of Cl, F and TOC, which is the starting point for further analysis.

The data obtained in this study can not only be used to characterize wine samples originating from Poland, but also to provide some important information regarding parameters which are useful as variables when designing multivariate statistical methods for wine classification in accordance to soil type. The detailed information can be useful for the producers of wine not only on an industrial scale, but also for personal use.

Experimental

The pH-meter Hi 8314 from Hanna was used to measure pH and conductivity. The Shimadzu TOC-V CSH Total Organic Carbon Analyzer was used to analyze the total organic carbon content. However, in this case the samples required dilution in a ratio of 1:300. The content of the selected ions was determined using Dionex ICS-3000 Ion Chromatograph. Samples had to be diluted in a ratio of 1:20 prior the analysis.

Quality assurance

Calibration of the measuring instrument was performed using the external calibration technique—the calibration curve method using the appropriately prepared standard solutions of ions (IC) tested. The correlation coefficient observed for ions exceeded 0.999. Sensitivity of the developed method was considered in terms of limit of detection (LOD) and limit of quantification (LOQ) which have been set according to the technique recommended by OIV [10]. The two limits were based on values of the standard deviation of the intercept (Sa) and were deduced from mathematical expressions: LOD = (3,3*Sa)/b and LOQ = 3*LOD. The obtained results are listed in Table 4.

Table 4 Basic validation parameters of the procedure for individual anions

Samples

Samples of wines (25 in total, 18 red and 7 white) from 7 Polish vineyards: Zodiak (80% clay sand, 20% clay soil), Przy Talerzyku (loam soil), Pod Orzechem (sandy-loam soil), Winnica Kozielec (sandy-loam soil), Spotkaniówka (loamy-sand soil), Stok (sandy soil) and Nad Dworskim Potokiem (luvisol, gleysol) located throughout the country, from the Pomeranian Voivodeship (~ 125 m a.s.l.) to the Podkarpackie Voivodeship (~ 320 m a.s.l.) were used to carry out the research [11,12,13,14,15,16,17].

Wines being an object of the study were produced between 2014 and 2016; 19 of them were dry and 6 were semi-dry. The alcohol level ranged from 10 to 13.6%. During production, potassium pyrosulfite was added to protect the product against the contamination with wild yeast and the development of fungi and bacteria in the solution. In all cases, filtration was carried out only once, except for one case—the wine obtained from Frontenac grapes (10R).