Introduction

According to Council Directive 2001/110/EC [1], honey is defined as “the natural sweet substance produced by Apis mellifera bees from the nectar of plants, or from secretions of living parts of plants, or excretions of plant-sucking insects on the living parts of plants, which the bees collect, transform by combining with specific substances of their own, deposit, dehydrate, store and leave in the honeycomb to ripen and mature”. Depending on its botanical origin, honey is classified in two kinds, i.e., blossom (derived from the nectar of plants) and honeydew honey (derived mostly from secretions of insects).

Generally, in terms of composition, honey is composed of sugars (70–80%), primarily glucose and fructose, water (15–17%), and minor quantities of various health-promoting compounds, such as proteins, amino acids, vitamins, minerals, phenolic compounds, enzymes, and other biological compounds [2, 3]. Even though the different honeys’ overall composition is similar, the geographical and botanical origin affects the chemical constituents, specifically those in minor quantities and, hence the final quality characteristics of each type of honey [4].

Polyfloral honey is a mixture of different proportions of sources of nectar and honeydew, since bees collect nectar from a variety of plants [5]. On the other hand, monofloral honey is produced from nectar that mostly comes from one type of plant and has particular organoleptic characteristics [6]. Among the monofloral blossom honeys, thyme honey is most prevalent in Mediterranean countries, and is very appreciated by consumers because of its pleasant flavor and odor. Pine honey is an another popular monofloral type and it is produced from honeydew secreted by the insect Marchalina hellenica (Coccoidea: Homoptera), which lives as a parasite in some pine tree species, such as Pinus brutia and Pinus halepensis [6, 7]. Pine honey is produced only in Greece and Turkey [8]. Thyme and pine honeys constitute a significant proportion of honey production, and therefore, honey consumption in the countries of the Mediterranean basin.

There is a growing commercial interest in honey, mainly due to its antimicrobial and antioxidant properties [9, 10] and, therefore, monofloral honeys of distinct botanical and geographical origin are more expensive than polyfloral ones. In order to ensure the quality of monofloral honeys, reliable methods have been developed to determine the authenticity of botanical and geographical origin of the product. The characterization of honey origin is carried out with pollen analysis, as well as with the determination of physical and chemical parameters of honey, such as moisture content, sugar profile, electrical conductivity, 5-hydroxymethylfurfura (5-HMF) content etc.; EU legislation (2001) [1] sets limits for the aforementioned parameters. The classical analytical techniques for the determination of these parameters provide valuable information for the classification of honey, however they are time consuming, require expensive equipment, intricate sample preparation, and experienced personnel. Hence, in recent years, there have been efforts to determine the botanical and geographical origin of honey with alternative approaches, such as spectroscopic techniques including Fourier transform infrared spectroscopy (FTIR) [11,12,13], near infrared spectroscopy (NIR) [14, 15], Raman spectroscopy [16,17,18,19] and fluorescence spectroscopy [20,21,22]. These methods offer several advantages, such as being simple, quick, non-destructive, and relatively inexpensive to carry out [23, 24]. In order to forecast the origin of honey, the spectral data from the above techniques are combined with chemometrics.

For classification problems, e.g., discrimination of geographical and botanical origin of honeys, several approaches have been applied, such as one-class and multi-class classification methods. The most widely used one-class classification method for chemometric analysis is Data Driven - Soft Independent Modelling of Class Analogies (DD-SIMCA), a supervised classification method which assigns new objects to the target class when the degree of similarity obeys the decision criteria [25, 26]. Specifically, DD-SIMCA uses information only from the target class to optimize the model and it is used when someone is interested to assess whether a sample belongs to a specified class or not. On the other hand, multi-class classification methods use information from all classes. Widely-used techniques in this area are Partial Least Squares – Discriminant Analysis (PLS-DA) and Random Forest (RF) algorithm. PLS-DA is a supervised classification method which can reduce the large number of initial independent variables to a smaller number of latent variables. Similar to Principal Component Analysis (PCA), latent variables are computed as linear combinations of the original variables for explaining the maximum covariance of input and output data. PLS-DA uses information from both target and non-target objects to optimize the model [27]. RF algorithm is an ensemble learning approach for building predictive models for classification and regression tasks. The method builds a series of decision trees that act as weak classifiers. These classifiers may have a poor performance individually, but they perform very well in an aggregate form. RF has great accuracy on classification tasks across a wide range of applications [28].

Recently, ultraviolet - visible (UV-vis) spectroscopy has been successfully used as a promising spectral technique to determine honey adulteration [29,30,31] because it provides multiple advantages, such as being a simple, inexpensive, and fast method with minimal or no sample preparation. However, there are very few studies that use UV-vis spectroscopy to determine the botanical [5, 32, 33] and geographical [34, 35] origin of honey. Specifically, Suhandy and Yulia [33] utilized SIMCA for detecting the botanical origin of Indonesian honeys, while Orfanakis et al. [32] employed orthogonal partial least squares discriminant analysis (OPLS-DA) in order to classify Greek honeys based on their botanical origin, i.e. thyme, polyfloral and honeydew honeys. Roshan et al. [5] utilized SIMCA to identify the monofloral Yemeni Sidr honeys using thirteen Sidr and twenty-five honeys for different botanical origins, while Ansari et al. [34] successfully applied SIMCA to discriminate monofloral Sidr honey (Ziziphus sp.) from honeys of different botanical origins and geographical locations of Saudi Arabia using a total of eighteen samples. However, to the best of our knowledge, there are no studies using UV-vis spectroscopy for the discrimination of both the geographical and botanical origin of the honeys which came from different countries of the Mediterranean basin. In addition, previous studies have only utilized one classification method, either SIMCA or OPLS-DA. In our work, both one-class and multi-class classification methods were compared based on their ability to predict the honey origin. Taking the above into consideration, the aim of the present study was to differentiate Mediterranean honey samples with regards to their botanical and geographical origin, using a simple and inexpensive method based on UV-vis spectroscopy combined with several multivariate statistical analysis.

Materials and methods

Honey samples

A total of 151 commercial honey samples were collected from different geographical regions of Mediterranean countries. Specifically, the honey samples were gathered from Greece (46 samples), Tunisia (42 samples), Turkey (31 samples), Spain (22 samples), and Malta (10 samples) and they were harvested during the 2021–2022 season. In the present study, three different types of honey were evaluated, namely thyme (72), pine (20), and polyfloral (59), that were collected either from beekeepers or from the market. The geographical and botanical origin of the honey samples is shown in Table 1. All samples were kept in clean plastic containers and stored under refrigeration until analysis.

Table 1 Botanical and geographical origin of honey samples

UV-vis spectroscopy

Prior to spectral measurements, honey samples were heated at 40 °C for 60 min in a water bath in order to dissolve any crystals and obtain homogeneous samples. For spectroscopic measurements, honey samples were diluted at 3% (w/w) using double-distilled water. The UV-vis spectra were acquired in the range of 190–900 nm at 0.5 nm interval using a UV-vis Spectrophotometer (UV-1700, Shimadzu Corporation, Japan). For sample analysis, diluted honey samples were placed in quartz cells and distilled water was used as blank for absorbance spectra measurements. All measurements were carried out at room temperature. Five specimen replicates were used for each honey sample and the average spectrum was used for further analysis and construction of the chemometric models.

Chemometric analysis

In order to explore the data and build the classification models using UV-vis spectra, three widely-used techniques in the field of chemometric methods were utilized in this work: PCA, PLS-DA, and DD-SIMCA. Furthermore, the RF algorithm, a machine learning algorithm, was also included in the present study.

For the generation of the models, the data set was divided into two groups, the training set (80% of the samples) to build the models and the test set (20%) to validate the models, performing cross validation. RF and PLS-DA utilize the membership of samples from various classes to generate the models (i.e., classes for each type of honey when considering the botanical origin or classes for each country when considering the geographical origin). On the other hand, DD-SIMCA models were built in order to predict a specific class, i.e., a specific geographical and/or botanical origin, since they target a single-class classification problem. For all models, the significance level was set at 5% (α = 0.05). The performance of the models was evaluated using the criteria of accuracy, precision, recall, and F1-score for RF and PLS-DA and accuracy, sensitivity, and specificity for DD-SIMCA, which were calculated by the following equations.

$$Accuracy= \frac{(TP+TN)}{Total}= \frac{(TP+TN)}{(TP+FP+FN+TN)}$$
(1)
$$Precision= \frac{TP}{Predicted\,positive}= \frac{TP}{(TP+FP)}$$
(2)
$$Recall\,or\,Sensitivity = \frac{TP}{Actual\,positive}= \frac{TP}{(TP+FN)}$$
(3)
$$Specificity = \frac{TN}{Actual\,negative}= \frac{TN}{(FP+TN)}$$
(4)
$$F1\text{-}score = 2\times \frac{(Precision\times\,Recall)}{(Precision+Recall)}$$
(5)

where TP: True Positives, TN: True Negatives, FP: False Positives, and FN: False Negatives from the confusion matrix.

All metric values reported in the rest of the paper refer to those achieved on the test set. In case of the DD-SIMCA model, the sensitivity was calculated both on the training and the test set since the test set included only 20% of the data belonging to the original class.

Before models’ construction, different preprocessing techniques for UV-vis spectra were investigated in order to eliminate systematic variations on the baseline and determine the optimum one, such as Savitzky–Golay smoothing and derivatives (1st and 2nd with a second order polynomial and 11-point window size), and standard normal variate (SNV). The best performance was achieved using a combination of the SNV technique, the Savitzky–Golay smoothing method, and the 1st derivatives. Moreover, mean center method was performed in the preprocessing stage.

The chemometric analysis was performed in Python with the scikit toolbox for accessing the implementation of PCA, RF, and PLS-DA. The DD-SIMCA models were built using a MATLAB code available at https://github.com/yzontov/dd-simca.

Results and discussion

UV-vis spectra

The UV-vis spectra of thyme, pine, and polyfloral honey samples from different Mediterranean counties are depicted in Fig. 1. As it can be seen, large variations in the absorbance spectra in the region between 220 and 550 nm were observed among honey samples from different geographical and botanical origin. According to Roshan et al. [5] the ultraviolet – visible region between 200 and 500 nm contains information from UV-absorbing systems and different chromophores exemplified in various components, such as sugars, phenolics, flavonoids, and conjugated systems.

Fig. 1
figure 1

The original UV-vis spectra of the honeys with different botanical and geographical origin in the range of 190–900 nm

As depicted in Fig. 1, several peaks were observed at the wavelength range of 220–340 nm, and namely around at 266, 270, 280, 290, and 330 nm. According to literature, the absorbance band in the range of 200–260 nm is mainly due to sugars, specifically glucose and fructose, and phenolic compounds, which are present in honey [5, 31, 32]. The absorption band in the range of 260–300 nm is associated with amino acids, mainly tryptophan, proteins, and phenolic compounds [33, 36], while the band at 300–340 nm, which is noticed in some samples, is related to flavonoids and some other phenolic compounds [32, 36].

The UV-vis spectra of thyme, pine, and polyfloral samples show significant variations in the range of 220–350 nm, as illustrated in Fig. 1, which are associated with their chemical composition and specifically with the sugar profile and phenolic content of honeys. According to the legislation (EU Directive 110/2001), the sugar content of blossom honeys has to be over 60 g/100 g, while the limit for honeydew honeys is 45 g/100 g. The blossom honeys contain approximately 65–70% of common sugars, glucose, and fructose, and also contain some other carbohydrates, such as maltose, sucrose, trehalose, and turanose in minor quantities [37]. On the other hand, honeydew honeys contain lower concentration of glucose and fructose than polyflorals, the other carbohydrates are absent, while contains higher quantities of oligosaccharides, mainly trisaccharides, melezitose, and raffinose [3, 37]. Regarding the flavonoids and phenolic compounds, both the profile and the concentration are significantly influenced by the type of honey [38, 39] and are correlated by the colour of the honeys, as the dark-coloured honeys have more phenolic compounds and fewer flavonoids than light-coloured honeys [3]. Moreover, the phenolic profile of different types of honey varies according to the constituents of nectars that are influenced by the variation in the composition of the same plant species growing in different geographical areas [40].

Principal components analysis

The UV-vis spectra data are highly noisy with a high absorbance, specifically in the range of 190–250 nm, thus several preprocessing methods were utilized for reducing the noise and improving the quality of the raw spectral data. PCA was applied in order to discriminate (a) the geographical and (b) the botanical origin of different honey samples, demonstrating a good differentiation among samples, as illustrated in Fig. 2. Specifically, Fig. 2 shows the results of PCA in a two-dimensional score plot of the first two principal components and loading plots of first three components for the discrimination of geographical origin of thyme (1.a and 1.b), pine (2.a and 2.b), and polyfloral (3.a and 3.b) honey samples, and for the discrimination of botanical origin of Greek honeys (4.a and 4.b).

Fig. 2
figure 2

Score plots of the first two components (a) and loading plots of first three components (b) for the discrimination of geographical origin of thyme (1), pine (2), and polyfloral (3) honey samples and botanical origin of Greek honey sample (4) from PCA of UV-vis spectral data pre-processed with SNV in the range of 220–550 nm (regarding the geographical origin: ES: Spanish, GR; Greek, MT: Maltese, TN: Tunisian and TR: Turkish, regarding the botanical origin: Pf: polyfloral, Pi: Pine, and Th: Thyme)

Regarding the geographical origin of different types of honey, the total variance explained by the model for the first three PCs was 97%, 96%, and 98% for thyme, polyfloral, and pine samples, respectively. According to the loading plots (Fig. 2), several wavelengths at around 235, 250, 265, 287, 290, 317, and 330 nm seem to be the most influential variables to classify honey samples. The first PC, which accounts for 79–86% of the variance, has a negative peak at 287 nm. The second PC has negative peaks at 250 and 317 nm (325 nm in the case of pines) and a positive peak at around 283 nm (286 nm in the case of thymes). In the case of analysis of thyme and polyfloral honeys, the third PC has a negative peak at around 260 nm, while two small peaks are shown at around 317 and 330, which are related to the flavonoids and some phenolic compounds, as previously mentioned.

Regarding the botanical origin of Greek honeys, the first three PC explain 97% of the total variance (Fig. 2.(4.b)). As in the previous cases, the peaks at around 240, 255, 280, 285, 308, and 311 nm were found to affect the discrimination of Greek honey samples. The range of 220–550 nm was selected for the classification methods, as the most peaks belong to this range and presented strong correlation.

Classification methods – chemometrics

To correctly classify the honeys according to the geographical and botanical origin, several models were constructed using the optimum number of PC for each class, which was determined by a 10 times random cross validation method, where the dataset was randomly split ten times in 80:20 train-test split, in order to enhance the robustness of the results. The result is the average sample-by-sample classification accuracy.

Geographical origin

Regarding the geographical origin of honey samples, RF and PLS-DA were used to perform a multiclass classification task creating different classes, one per country of origin. In the training stage of the model, 80% of the samples of each class was used to build the models, while for the validation stage, new samples (not used in the training stage) were classified within the established classes to evaluate the models’ efficiency. Using all honey samples in order to develop a general model with RF algorithm to predict the geographical origin, the model achieved an accuracy, precision, recall, and F1-score of approximately 63%, 68%, 64%, and 64%, respectively, which is relatively low. Thus, separate models were developed for each botanical origin of honey and the accuracy achieved was 93%, 82% and 68% for pine, thyme, and polyfloral honeys, respectively. Using PLS-DA, the classification results were inferior compared to RF as shown in Table 2, which summarizes the results obtained from the developed models.

Regarding the thyme honeys, a good differentiation among samples was performed as shown in Table 2 and Fig. 2.(1.a). Thyme honey must contain at least 18% thyme pollen grains (Greek directive 127/2004) [41], meaning that the remaining pollen may belong to other plants, which are located in the particular area around the beehive. As can be deduced from the literature [42,43,44], the pollen content of Mediterranean thyme honeys can be differentiated according to the different percentage of thyme pollen grains and the different types of pollens from other plants. Therefore, the differentiations coming from the pollen grains and the trace elements gave a different fingerprint on the spectrum and, based on these, the discrimination of the honeys’ origin was performed.

In relation to the geographical origin of pine honeys, a clear separation was obtained between the two different countries that produce pine honey (Greece and Turkey), as shown in Table 2. This clear differentiation was probably due to the differences in honeydew elements (fungal spores, hyphae, algae, and wax elements of honeydew-producing insects) and pollen grains of the two groups of pine honey [45, 46], which were reflected in the spectrum. With respect to the classification of polyfloral honeys, the accuracy of the models was inferior to the other types of honey due to the large variability of the pollen types. The polyfloral samples are a combination of blossom and honeydew honeys which are related to the specific areas where they are produced. Thus, the different blossoms and tree varieties resulted in differences in the spectra, and, therefore, in the models.

Table 2 Results from Random Forest and PLS-DA models for determining the geographical and botanical origin of honey samples using UV-vis spectra

Afterwards, DD-SIMCA method, a one-class classification technique, was applied in order to predict a specific class, i.e., whether a honey comes from a specific country or not, and the results are shown in Table 3. Using all honey samples (thyme, pine, and polyflorals), DD-SIMCA models were constructed for Greek, Spanish, Maltese, Tunisian, and Turkish honeys, achieving 62%, 80%, 99%, 78%, and 75% of accuracy in the validation set, respectively. The use of all honey samples for the development of the models, including the differences among the types of honey, and specifically between blossom and honeydew honeys, resulted in reduced accuracy of the models. Therefore, it was deemed necessary to develop models for each type of honey separately. Hence, in the case of thyme honeys, the DD-SIMCA models for Greek, Spanish, and Tunisian thyme honeys achieved an accuracy of 92%, 97%, and 80% in the validation set, respectively. These predictive models misclassified three, one, and nine honeys in the case of Greek, Spanish, and Tunisian thyme honeys, achieving 95%, 99%, and 83% specificity, respectively.

Concerning the models developed for pine honeys, the accuracy of the models was 83% and 87% for Greek and Turkish honeys, respectively. According to the specificity value, which reflects the number of correct negative decisions divided by the total number of negative cases, all samples were correctly classified. However, the same number of original Greek or Turkish honeys were misclassified, reducing the sensitivity and, therefore the accuracy of the models. The accuracy and sensitivity values could be increased by increasing the number of honey samples to validate the models. In the case of polyfloral honeys, the DD-SIMCA models for Greek, Maltese, Spanish, Tunisian, and Turkish honeys achieved an accuracy of 96%, 97%, 97%, 97%, and 94% in the validation set, respectively. The results show that DD-SIMCA models achieved a higher accuracy than RF and PLS-DA on polyfloral honeys. These results are in agreement with de Souza et al. [29] who mentioned that DD-SIMCA outperformed OC-PLS in terms of sensitivity and specificity. Generally, these results are also in agreement with Ansari et al. [34], who successfully discriminated the geographical and botanical origin of honeys from Saudi Arabia, using UV spectra in the wavelength range of 200 to 400 nm, utilizing the SIMCA method. Finally, Roshan et al. [5] were able to successfully differentiate genuine Sidr honeys from non-Sidr honeys using PCA and SIMCA.

Table 3 Results from DD-SIMCA models for determining the geographical and botanical origin of honey samples using UV-vis spectra

Botanical origin

With regard to the botanical origin, Greek thyme, pine, and polyfloral samples were used in order to build classification models. First, RF and PLS-DA were used to perform a multiclass classification task. Using RF, the validation accuracy of the model was 83%, while the values of precision, recall, and F1-score were 81%, 76%, and 73%, respectively. Across various runs, it is observed that some polyfloral samples were misclassified as thyme ones, which is reasonable because polyflorals are honeys produced by several blossom and trees, including Thymus species (Lamiaceae). Thus, after excluding the polyfloral samples, a binary classification model was generated with an accuracy of 94%. Using PLS-DA for the classification of all Greek honeys, the results were inferior compared to RF. However, PLS-DA can handle the binary classification task, i.e., discriminating whether a sample is a thyme or pine one, with a high accuracy (100%).

Proceeding with the DD-SIMCA models, the first model was built in order to predict whether a honey was thyme or not and then it was validated with all the remaining Greek honey samples. The validation accuracy of the model was 87%, as some polyfloral samples were misclassified as thyme ones. When the polyfloral samples were excluded from the analysis, the model achieved higher values of sensitivity and specificity. Hence, we implemented a two-stage approach using SIMCA, where in the first stage the pine samples were used to build a model in order to predict whether a honey was honeydew or blossom and in the second stage the validated samples were used to distinguish whether a honey was thyme or polyfloral. In the first stage, the model achieved an accuracy of 95%, while in the second stage the accuracy was 67%. The results show that the accuracy of the models decreases when they have to separate thyme and polyfloral samples because polyfloral samples examined in the present study may have a significant percentage of thyme pollens and therefore the UV-vis spectra have similarities to the thyme ones. These results compare moderately well with previous studies, in which different models were used in order to classify the botanical origin of honey. Orfanakis et al. [32] applied OPLS-DA in order to classify polyfloral (92.65%) and honeydew (91.30%) honeys and the classification accuracy was over 91%. However, when they tried to classify thyme and polyfloral samples, the classification accuracy was 86.96% and 80%, respectively. Moreover, Suhandy and Yulia [33], which used DD-SIMCA models, confirmed that the different honey types from Indonesia could be distinguished using a wavelength range of 200 to 400 nm, achieving an accuracy, sensitivity, and specificity of 100% for all models.

Combination of both geographical and botanical origin

The main objective of this research was the successful discrimination of honeys that came from a specific geographical and botanical origin. Therefore, several DD-SIMCA models were developed for this reason and the results are shown in Table 4. Regarding Greek thyme honeys, the DD-SIMCA models achieved an accuracy of 91%, 92%, and 91% when the validation set included all honey samples, only thyme samples, and both thyme and polyfloral samples, respectively. There are some small differences in the sensitivity and specificity values among models, which are expected as the number of samples for model validations was different. Figure 3.(1.a) depicts the acceptance plot of the training set for DD-SIMCA models of Greek thyme honeys. The predictive model, using all thyme honey samples for validation, misclassified one Greek thyme sample, while two Tunisian samples were misclassified as Greek thyme ones, as illustrated in the acceptance plot of validation in Fig. 3.(1.b).

Fig. 3
figure 3

Acceptance plot of the training set (1.a, 2.a, and 3.a) and validation set (1.b, 2.b, and 3.b) obtained by DD-SIMCA models using the UV–vis spectra, pre-processed with SNV and 1st derivatives, for Greek (1.a and 1.b), Spanish (2.a and 2.b), and Tunisian (3.a and 3.b) thyme honeys. [Acceptance plot for training set: training samples are illustrated in green circles, while extreme objects are illustrated in orange circles between the green and red lines. Acceptance plot for validation set: the honey samples for target class are illustrated in green circles, while samples for non-target class are depicted in red circles]

Regarding the Spanish thyme honeys, the DD-SIMCA model achieved an accuracy of 97% when the validation set included all thyme honey samples, while the accuracy of the model for Tunisian thyme honeys was 80%. In the case of Spanish thyme honeys, one Tunisian thyme honey was misclassified as Spanish (Fig. 3.(2.b)). On the other hand, as depicted in the acceptance plot of the DD-SIMCA model for Tunisian thyme honeys (Fig. 3.(3.b)), one Tunisian thyme honey was misclassified, while eight thyme samples (seven Greek and one Spanish) were misclassified as Tunisian. These results are in agreement with the results from PCA, as some Greek and Tunisian honeys were found in the same area of the plot (Fig. 2.(1.a)).

Regarding the pine honey samples, the DD-SIMCA models achieved an accuracy of 99% both for Greek and Turkish honeys when the validation set included all honey samples, while the accuracy values were lower, namely 83% and 88%, respectively, when the validation set included only pine samples. As shown in Table 4, the specificity was 100% for both models, meaning that all external validation samples correctly classified as not Greek/Turkish pine honeys. On the other hand, there are some samples from the main class that were misclassified and due to low number of samples, the validation accuracy results in a lower value. This problem could be mitigated by utilizing more samples to build and validate the models.

Table 4 The results obtained from DD-SIMCA models for determining both the geographical and botanical origin of honey samples using UV-vis spectra

Conclusions

Honey samples from different botanical origin, namely thyme, pine, and polyfloral, and geographical regions of Mediterranean (Greece, Malta, Spain, Tunisia, and Turkey) were examined by UV-vis spectroscopy in order to identify their origin. The results of the current study showed that ultraviolet-visible spectroscopy along with multivariate statistical analysis successfully classified the different types of Mediterranean honeys collected from various botanical sources and geographical locations. PCA of the spectra revealed that the wavelengths at around 240, 250, 265, 280, 287, 290, 311, 317, and 330 nm are the most sensitive for discrimination among the samples and thus the wavelength range of 220–550 nm was used for the models’ construction. RF, PLS-DA, and SIMCA were applied to discriminate (a) the geographical, (b) the botanical, and (c) both the combination of the geographical and botanical origin. The prediction performance of these models achieved an accuracy over 90% for most models. Hence, the results from the present study point out that UV-vis spectroscopy has great potential for determining the geographical and botanical origin of Mediterranean honeys in a quick and non-destructive manner, as also having the advantage of being simple, inexpensive, and fast. However, further studies with a larger number of honey samples are necessary, in order to develop robust models and verify the generalization of them.