Introduction

NIR spectroscopy has been established as a cost-efficient analytical tool to determine the authenticity of foodstuffs (Biancolillo et al. 2018; Firmani et al. 2019a, b; Karunathilaka et al. 2018; Ruoff et al. 2006). Two key advantages of this technology compared to other commonly used analytical approaches such as nuclear magnetic resonance and mass spectrometry are the rapid sample preparation and short measurement time, as well as the low acquisition and maintenance costs (Duarte et al. 2004; Malik et al. 2010). This makes NIR spectroscopy suitable for incoming goods inspection at food processing companies, which is not the case for most other methods due to the higher personnel and infrastructure requirements. NIR analysis also has the advantages of being a simple and environmentally friendly method, as no solvents are required (Armenta et al. 2008; Blanco and Villarroya 2002; Gałuszka et al. 2013). Forgoing lyophilization would make NIR spectroscopy much more attractive for incoming goods inspection in an industrial setting, as this is the most time-intensive step in sample preparation and without it a result is available within minutes.

NIR spectra of foods with a high water content show broad absorption signals of water in the regions at 6940 cm−1 and 5180 cm−1 (Büning-Pfaue 2003). These signals mask other important information needed for valid interpretation. Because of this, NIR measurements are often preceded by relatively time-consuming freeze-drying, which can take from a few hours to a few days, depending on sample volume and water content (Arndt et al. 2020b, 2021; Büning-Pfaue 2003; Richter et al. 2019; Segelke et al. 2020). However, this negates the advantage of rapid data acquisition. Nevertheless, in some cases, it is possible to generate well-performing classification models of foodstuffs even without freeze-drying (Aykas and Menevseoglu 2021; Ghidini et al. 2019; Ríos-Reina et al. 2018; Zhou et al. 2015). However, the question of whether and how much freeze-drying is necessary to obtain successful classification results in a reasonable amount of time has not yet been adequately answered. There have been studies on the analysis of meat, fish, cheese, and almonds comparing the classification results of non-lyophilized and lyophilized samples, but without investigating the influence of water on the NIR spectra in more depth (Andueza et al. 2013; Arndt et al. 2020a; Bázár et al. 2009; Xiccato et al. 2004). Arndt et al. (2020a) compared different preparation methods of almonds (Prunus dulcis Mill.) and their respective classification results with respect to their geographical origin. Whole, bisected, ground, and lyophilized almond samples were compared, with lyophilized almonds providing the best classification results. The approach was taken up and further investigated in this study by thoroughly including the freeze-drying process and its influence on the classification results.

Food fraud with almonds is observed due to the fundamentally high prices and, in addition, the strong price differences concerning different geographical origins (Food and Agriculture Organization of the United Nations 2021; Manning and Soon 2016). Moreover, almonds from Mallorca have a protected geographical indication, which makes them particularly attractive to conscious consumers (Publications Office of the European Union 2013). Therefore, in order to counteract possible food fraud, methods are needed that allow experimental determination of the geographical origin. Ideally, detection of this fraud should already take place at the point of receipt so that misdeclared goods can be rejected.

To identify the spectral features that are responsible for successful classification, we applied the random forest–based feature selection approach Boruta, which is based on comparing the importance of features with the maximum importance of so-called shadow features obtained by permuting of the data across observations (Kursa and Rudnicki 2010). We selected Boruta from a large number of existing selection techniques because it was shown to be the best performing in a comprehensive comparison study (Degenhardt et al. 2019; Janitza et al. 2018; Kursa and Rudnicki 2010; Seifert et al. 2019).

In order to make NIR spectroscopy more useful for rapid authentication of nuts in the context of incoming goods inspection, we investigated the water content during the freeze-drying process and used supervised multivariate methods to obtain classification accuracies and compare them between non-freeze-dried samples and samples which were freeze-dried for 3, 24, and 48 h. The aim was to investigate whether freeze-drying is required to obtain a well-functioning classification model, or if there is a minimum drying time to achieve this goal.

Materials and Methods

Samples

A total of 72 almond samples, twelve from each of the six most economically relevant growing countries (Australia, Iran, Italy, Morocco, Spain, the USA) were used. Samples were acquired shelled or unshelled. To reflect maximum biological diversity, samples were analyzed from crop years 2016 to 2019, with Australian samples only covering the years 2017 to 2019 due to the seasonal shift between hemispheres. In total, 12 samples from 2016, 19 samples from 2017, 23 samples from 2018, and 18 samples from 2019 were analyzed. More than 20 different varieties were analyzed, including sweet and bitter almonds. Samples were acquired either directly from growers or from personally known importers and were thus considered authentic. An overview about the analyzed samples can be found in Table S1 in the Supplementary Information. In addition, a non-authentic almond sample was purchased in a supermarket in Hamburg (Germany) and used as a quality control sample to track the lyophilization process, since a lot of sample material was necessary for this approach.

Sample Preparation

To study the lyophilization process, the quality control almond sample was first snap-frozen in liquid nitrogen and then ground under dry ice cooling (1/2 wt/wt, almonds/dry ice) using a knife mill (Grindomix GM 300, Retsch, Haan, Germany). Subsequently, the homogenized sample was aliquoted, each containing 4.0 ± 0.1 g homogenized powder. Each aliquot was lyophilized (Beta 1–8 LSCplus, Martin Christ Freeze Dryers GmbH, Osterode, Germany) for different lengths of times to monitor the decrease in water content. The following times were chosen as lyophilization durations: 30 min, 1 h, 2 h, 3 h, 6 h, 9 h, 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days. Each sample was stirred halfway through each lyophilization duration by shaking the samples vigorously.

The authentic almond samples were prepared analogously to the quality control sample. Additionally, unshelled samples were cracked manually under dry ice cooling to remove the endocarp. The ground powders were then directly analyzed as well as lyophilized for different periods of time. Lyophilization times of 3 h, 24 h, and 48 h were chosen, using 4.0 ± 0.1 g per sample.

Karl-Fischer Titration

For the determination of the water content, approximately 300 mg of the sample was suspended with approximately 50 mL anhydrous methanol (CombiMethanol, Merck KGaA, Darmstadt, Germany) (Gallina et al. 2010). The sample was then titrated automatically to the end point (888 Titrando, Deutsche Metrohm GmbH & Co. KG, Filderstadt, Germany) using Karl-Fischer reagent (CombiTitrant 5, Merck KGaA, Darmstadt, Germany). For the determination of the titer, the consumption of 20 µL of distilled water was measured in triplicate.

NIR Spectroscopy

1.25 ± 0.01 g of each of the non-lyophilized or freeze-dried ground almonds were thawed at 22 °C (± 2 °C) in closed glass vials (52.0 mm × 22.0 mm × 1.2 mm, Nipro Diagnostics Germany GmbH, Ratingen, Germany). The samples were analyzed in reflectance mode using an FT-NIR spectrometer with an integration sphere (TANGO, Bruker Optics, Bremen, Germany). Fifty scans per spectrum were measured with a resolution of 2 cm−1 in a wavenumber range of 11,550–3,950 cm−1. Data were acquired using OPUS software (Bruker Optics, Bremen, Germany). To represent the population of samples as well as possible, five technical replicates were measured by shaking the vials vigorously between measurements.

NIR Spectra Preprocessing

MATLAB R2021b (The MathWorks Inc., Natick, USA) was used for all preprocessing steps. Scattered light effects are triggered by the nature of the samples due to different particle sizes and multiplicative scatter correction is applied to reduce the scattering effects in the NIR spectra (Dhanoa et al. 1994; Geladi et al. 1985). Since adjacent features correlate strongly, the number of features can be reduced, thereby decreasing the computation time (Engel et al. 2013). Therefore, binning was also performed by averaging the intensity of five adjacent wavenumbers, leaving 744 features for classification (Arndt et al. 2021; Shakiba et al. 2021). Finally, the arithmetic mean of the five measured spectra was calculated.

Multivariate Data Analysis

MATLAB R2021b was used for principal component analysis (PCA) as well as to train and validate classification models with various machine learning algorithms. The processed NIR data were used to investigate which machine learning algorithms showed the most promising initial results in the Classification Learner app. Samples were classified according to their country of origin for each of the four data sets. Various methods like decision trees, discriminant analyses, support vector machines, nearest neighbor classifiers, and ensembles of classifiers were applied on the data. The algorithms with the highest classification accuracy were linear discriminant analysis (LDA) and an ensemble of discriminant classifiers using random subspaces, which is a type of bagging (Ho 1998; Rokach 2010; Tharwat et al. 2017). The NIR data of the 72 samples and their corresponding class information were then split into six outer test and training sets with a repeated nested cross-validation (RNCV) stratified by geographical origin to validate the models, which was repeated five times. During model building, the training data was again split into six inner test and training sets (Varma and Simon 2006; Watermann et al. 2021). This approach was independently followed for each of the four data sets. RNCV minimizes the risk of outlier splits and gives an average classification result for each lyophilization method. Another advantage of a RNCV is that all samples are used for validation multiple times. The training and test accuracies are the mean average of all their respective splits from the RNCV. Further measures used were the macro-F1 score as an additional measure of a test’s accuracy and Fleiss’ kappa, which gives information on the conformity of the five classification results of each sample (Davies and Fleiss 1982; Tharwat 2021).

Feature Selection with Boruta

The R package Pomona (version 1.0.1, https://github.com/silkeszy/Pomona) was used for Boruta feature selection with the parameters: ntree = 10,000, min.node.size = 1, mtry = 142, importance = impurity_corrected, maxRuns = 100, and pValue = 0.01 (Degenhardt et al. 2019).

Results and Discussion

Change of the Water Content During Freeze-Drying of a Quality Control Almond Sample

To monitor the freeze-drying process, 4 g of aliquots of the quality control almond sample was lyophilized for different time periods. Figure 1 shows the water contents determined by Karl-Fischer titration and illustrated that the water content of the quality control sample remained at a constant range of 0.36 ± 0.13% after a lyophilization period of 24 h. Additionally, the NIR spectra of the lyophilized samples confirmed these results since the water band at 5300–4950 cm−1 decreased constantly until it stagnated after a lyophilization time of 24 h. Furthermore, the range between 7200 and 6050 cm−1 was also influenced by the drying process but only in the first 2 h of lyophilization (see Fig. S1 in the Supplementary Information).

Fig. 1
figure 1

Scatter plot of the water content determined by Karl-Fischer titration from the quality control sample aliquots against their corresponding lyophilization durations

In previous studies, a freeze-drying time of 48 h was used for 60 g of ground almond sample (Arndt et al. 2020a, 2021). Results from the present study suggested that a lyophilization duration of 24 h for 4 g of sample material might already be sufficient to reduce the water content to a constant level for successful classification.

Change of the Water Content During Freeze-Drying of Authentic Almond Samples

To further investigate the lyophilization duration, we compared the NIR spectra of 72 almond samples freeze-dried for 3 h, 24 h, and 48 h with the non-lyophilized samples. Figure 2a shows the processed and averaged NIR spectra of all samples from the different lyophilization methods. This was also shown for the second derivative of the processed NIR spectra in Fig. 3. The O–H bonds from water absorb at the wavenumbers 9550–9390 cm−1, 7100–6240 cm−1, and 5155–5150 cm−1 (Workman and Weyer 2012). The mean spectra (Fig. 2a) of the non-lyophilized samples (Lyo-0 h), the 3-h freeze-dried samples (Lyo-3 h), the 24-h freeze-dried samples (Lyo-24 h), and the 48-h freeze-dried samples (Lyo-48 h) showed large deviations in the regions between 7200–6050 cm−1 and 5300–4950 cm−1; however, the spectra of samples after 24 h of lyophilization and 48 h showed only minimal differences between each other. Similarly, large differences were observed in the region between 5400 and 5150 cm−1 using the second derivative (Fig. 3), while the differences in the region between 7200 and 6050 cm−1 were much smaller than in the non-derivative spectra. As expected, a longer lyophilization duration led to a decrease in water concentration and thus in absorbance in those spectral regions. Figure 2b shows the boxplots of the signal intensities of the bin at wavenumber 5167 cm−1 from the four different sets of samples, which represented the highest absorption in the water band region from the non-derivative data. The boxplots of the spectra without lyophilization and after 3 h of lyophilization did not overlap with each other and were significantly different (p-value: 3.10 × 10−76). Furthermore, there were also significant differences between 3 h of lyophilization and 24 h (p-value: 6.03 × 10−71) or 48 h (p-value: 7.57 × 10−78) of lyophilization. However, the boxplots of the spectra after freeze-drying durations of 24 and 48 h overlapped (p-value: 1.03 × 10−6), and only small differences occurred in total absorbance values compared to the other lyophilization durations, supporting the results of the Karl-Fischer titration that suggested that a lyophilization duration of 24 h is sufficient to minimize the water content. A total of nine non-lyophilized samples that showed high-, medium-, and low-intensity water bands in the NIR spectra were measured by Karl-Fischer titration to accurately determine the actual water contents. Overall, the water content of the samples ranged from 3.0 to 10.5% which is also reflected by the intensities of the water bands in the respective NIR spectra (see Fig. S2 and Table S2 in the Supplementary Information). In summary, the observed variations in the spectra reflected the different water contents that could be expected at different lyophilization durations. In the following sections, we evaluate the impact that the water signals had in the NIR spectra on the determination of the geographical origin of almonds.

Fig. 2
figure 2

Results of the systematic evaluation of the lyophilization duration on the NIR spectra of almonds: (a) processed (MSC, binning, mean averaging) NIR spectra where the average of all 72 samples from each of the four sample sets was used. (b) Boxplots of the signal intensity of the water signal at 5167 cm−1 from the four sample sets

Fig. 3
figure 3

Processed (MSC, second derivative, binning, mean averaging) NIR spectra where the average of all 72 samples from each of the four sample sets was used

Principal Component Analysis

PCA of the samples with different lyophilization durations was used as an unsupervised method to analyze whether the main variances in the data set can be exploited for origin separation and to identify outliers. Further, we also investigated the influence of the harvest year with this approach. The scores plots of the PCAs colored and marked by origin are shown in Fig. 4ad and the loadings of the four different lyophilization methods are shown in Fig. 4eh. Figure 5 shows the scores plot of the PCA were the samples of the Lyo-48 h data set were colored and marked by harvest year, while the scores plots using the other lyophilization methods can be found in Fig. S3 in the Supplementary Information. The scores plots of freeze-dried samples showed mostly similar results when considering geographical origin. The principal component (PC) 1 explained 56.5–57.7% of the total variance in the freeze-dried samples and 76.8% in the non-freeze-dried samples.

Fig. 4
figure 4

Scores and loadings plots of the processed NIR spectra of almond samples without lyophilization (Lyo-0 h, a, e), as well as after 3 h (Lyo-3 h, b, f), 24 h (Lyo-24 h, c, g), and 48 h (Lyo-48 h, d, h) of lyophilization (Australia: circle, USA: square, Italy: star, Iran: diamond, Spain: asterisk, Morocco: triangle). For the first three, the principal components 1 and 2, while for the latter the principal components 1 and 3 are shown

Fig. 5
figure 5

Scores plot of the processed NIR spectra of Lyo-48 h with the harvest years marked (2016: circle, 2017: asterisk, 2018: diamond, 2019: triangle)

The Iranian (diamond), Moroccan (triangle), USA (square), and Australian (circle) samples showed a large overlap and clustered to the upper-left in the scores plot of non-lyophilized samples (see Fig. 4a). Spanish (asterisk) and Italian (star) samples clustered to the lower-left, close to each other, but with low overlap. Two Australian samples showed particularly high values for the first PC in the scores plot of non-lyophilized samples. Since the loadings of this PC (see Fig. 4e) were mainly affected by the water bands around 5150 cm−1 and 7000 cm−1, this meant that these samples had a high water content. This was confirmed by the assessment of these two individual NIR spectra (see Fig. S2 in the Supplementary Information) and the water contents were around 10% as determined by the Karl-Fischer titration (see Table S2 in the Supplementary Information). Spanish samples clustered more to the right than any other country, which meant that these samples overall had higher water contents than samples from other countries.

Iranian and Moroccan samples clustered to the lower part of the scores plot of Lyo-3 h (see Fig. 4b), which showed that these samples mostly had lower absorbance at 6000–5000 cm−1 (see Fig. 4f). Italian samples mainly clustered to the right and Iranian samples to the left, with the band around 4400–4200 cm−1 explaining PC1 the most. Spanish, Australian, and USA samples were mostly in the center of the scores plot and showed high intragroup variances. The water signal around 5200 cm−1 had a small impact on the PC1 and a large influence on PC2, which illustrated the smaller effect of the water signal on the total variance of samples.

Scores and loadings plot of Lyo-24 h and Lyo-48 h (see Fig. 4c, d, g, h) were very similar to each other. Spanish and Italian samples clustered more to the lower-right of both scores plots with more influence from the PC1 than the PC2. In both scores plots, Iranian and Moroccan samples clustered in the upper-left and USA samples in the center. Australian samples clustered in the center in Lyo-48 h and more to the right in Lyo-24 h. Bands between 7100 and 6000 cm−1 had negative values and between 6000 and 5200 cm−1 had positive values for PC1 and the reverse was the case for PC2, which was important for the clustering of Italian, Spanish, Iranian, and Moroccan samples.

As the metabolome is also strongly affected by other external factors than the country of origin, such as different climatic conditions in different harvest years, it is important to investigate a balanced sample set with samples from several harvest years to yield a robust model. Still, the influence of the harvest year was investigated by PCA. For this, the Lyo-48 h samples were marked in the scores plots by harvest year in Fig. 5 (see Fig. 4h for the loadings plot). No distinct clusters were apparent in the scores plot corresponding to a harvest year; instead, samples from all harvest years scattered across the plot, demonstrating that the influence of geographic origin is greater than the influence of harvest year since an impact of the harvest year is not apparent in the PC1 or PC3. As Arndt et al. (2021) outlined, several possible influences such as climate, storage, and anthropogenic influences can vary between different harvest years, which makes the impact of the harvest year difficult to evaluate.

Overall, the PCA did not show clear separations between samples originating from different countries, so supervised machine learning methods were used to achieve this goal.

Supervised Approaches

The processed NIR data sets were used to train supervised classification models for each lyophilization duration. First, we used the Classification Learner app from MATLAB R2021b to compare the classification performance of different algorithms. The results of the different algorithms are shown in Table S3 in the Supplementary Information. LDA provided the best results for each data set and was therefore selected for subsequent model building and classification. The results of the different lyophilization methods, i.e., the averaged training and classification accuracy, macro-F1 score, and Fleiss’ kappa, are shown in Table 1. The classification performances of the different lyophilization durations were very similar, as shown by the test accuracies (92.5–95.0%), macro-F1 scores (92.0–94.7%), and Fleiss’ kappa coefficients (88.0–94.5%). Taken together, this suggested that the models predicted the samples well and that the five iterations were largely consistent. More interesting than the specific model performances was that no trend is observed with respect to longer lyophilization durations.

Table 1 Classification results, macro-F1, and Fleiss’ kappa coefficient for the different sample sets using all 744 features with LDA. For all models, a RNCV was used

In a previous study, 64 authentic almond samples yielded in an accuracy of 71.9 ± 3.5% for the determination of the geographical origin of homogenized almonds and a slightly higher classification accuracy of 80.2 ± 1.9% for predicting lyophilized almonds (Arndt et al. 2020a). We did not observe differences in classification accuracies of lyophilized and homogenized sample material. Besides the pre-treatment of the data, the main difference was the use of the classifier: we used LDA, while Arndt et al. (2020a) used a support vector machine algorithm. Even though both methods used RNCV to build robust models, the data evaluation could lead to different results. Another possible reason for the different classification results is that we used classes with twelve samples each. Larger class sizes generally lead to better classification performance because they better describe the extremities and centroids of the classes (Foody et al. 2006).

Overall, only three samples in all four sample sets were misclassified at least once. Table S4–S7 in the Supplementary Information show the samples and the countries to which they were misclassified in the five iterations of the RNCV. One of the three frequently misclassified samples was a 2018 Australian sample that was classified as Australian only once, as Spanish twelve times, and as Moroccan seven times. As could be seen in the scores plots of all four data sets (Fig. 4ad), the reason for this was likely due to the large intragroup variance of the Australian samples. Another sample that was misclassified at least once across all four lyophilization methods was an Iranian sample from 2016 that was classified as Australian and Italian once, as Spanish seven times, as Iranian eight times, and as USA three times. This could be explained by the metadata of the sample set: Almost all samples from Australia, Spain, Italy, and the USA were sweet almonds, while almost all Moroccan and Iranian samples were bitter almonds. Since the only Iranian sweet almond sample was misclassified, it can be concluded that the model was probably only valid for Iranian bitter almonds samples. The third misclassified sample was an Italian sample from 2016, which was classified once as Iranian, four times as Italian, and 15 times as Moroccan. The reason for these misclassifications was again related to the differences in bitter and sweet almonds, as this sample belongs to a bitter variety. Moroccan and Iranian sweet almonds and Italian bitter almonds were underrepresented in their respective sample groups. Thus, classification during RNCV resulted in more misclassifications as their sample groups used for model training were not accurately representative for these subspecies. However, Moroccan and Iranian samples showed few misclassifications between each other. Similarly, countries that mainly contained sweet almonds (Australia, Italy, Spain, USA) also showed few misclassifications. Thus, an influence of the subspecies, although challenging to quantify, did not result in low classification accuracies or a high number of misclassifications between countries mainly containing a specific subspecies.

These results suggested that NIR spectroscopy is able to discriminate between sweet and bitter almonds, which was already demonstrated by Borràs et al. (2014). Cortés et al. (2018) even predicted the amygdalin content of almonds, determined by high-performance liquid chromatography coupled with a diode array detector by applying partial least squares regression on NIR data. This suggests that the amygdalin content had an effect on NIR spectra and the discrimination between sweet and bitter almonds.

Feature Selection with Boruta

A feature selection was conducted to find out which spectral regions and thus which substance classes were relevant for the determination of the geographical origin of almonds. The goal was to obtain more information on the NIR spectra, rather than to develop a targeted approach with fewer features. For this purpose, we used the feature selection algorithm Boruta, which selected 43, 40, 41, and 35 features in the data sets of spectra obtained after lyophilization durations of 0, 3, 24, and 48 h, respectively (see Table S8 in the Supplementary Information). The selected features for each lyophilization method were subsequently used to build classification models. Initially, an LDA was used for this purpose, but classification accuracies of less than 60% were obtained. Therefore, the best classification method was then determined in a similar manner as before utilizing all features with the results shown in Table S9 in the Supplementary Information. We identified an ensemble of discriminant classifiers using random subspaces with test accuracies between 81.1 and 85.0% as the best performing method (Table 2). These classification performances were about 10% lower than those using all 744 features, indicating that Boruta captured the features well that were most important for geographic identification. All selected features of the four preparation sets are listed in Table S8 in the Supplementary Information and are highlighted in Fig. 6 in the processed averaged NIR spectrum of the non-lyophilized samples.

Table 2 Classification results, macro-F1, and Fleiss’ kappa coefficient for the different sample sets using their respective Boruta features with an ensemble of discriminant classifiers using random subspaces. For all models, a RNCV was used
Fig. 6
figure 6

Processed mean averaged NIR spectra of the Lyo-0 h sample set with all features, which were selected by Boruta in the four different sample sets as red crosses

Features were selected from different parts of the spectrum that can be assigned to different classes of molecules, mainly lipids and proteins, but a definite assignment was not possible. Selected features from lipids were apparent in the range between 8700 and 8500 cm−1 and originated from the second overtone of C–H vibrations and the fourth overtone of C = O vibrations. In addition, the features between 6000 and 5800 cm−1 could be assigned to the first overtone of saturated and unsaturated hydrocarbon chains, while features between 7400 and 7000 cm−1 could be assigned to C–H bonds associated with aliphatic hydrocarbons (Burns and Ciurczak 2007; Workman and Weyer 2012).

There were many bands associated with peptides and proteins selected by Boruta. At 4410 cm−1 peptide vibrations of C = O and N–H could be observed in α-helices or β-sheets (Workman and Weyer 2012). Between 7000 and 6000 cm−1, there were several features corresponding to the first overtone of N–H and O–H bonds caused by the peptide bond and side chains of amino acids as well as carbohydrates (Burns and Ciurczak 2007; Workman and Weyer 2012).

Boruta selected many features (26 of 43 for the non-lyophilized samples) in the range of the water signal between 7100 and 6240 cm−1, originating from N–H and O–H bonds of proteins and carbohydrates. The NIR spectra of the non-lyophilized samples showed more intensive water signals, which can be attributed to the higher water content of the samples. This raised the question of whether water content was relevant for the determination of the provenance, which would be problematic since it was affected by post-harvest factors such as drying conditions and transport. Therefore, boxplots were generated depicting the intensity of the water band at 5167 cm−1 for each individual origin of the almond samples (Fig. 7). They showed that the intensity and hence the water content of the samples were comparable, and the samples from different countries showed no significant differences (p-value > 0.05). In addition, the bins around the water signal at 5150 cm−1 were not selected by Boruta. It could be concluded that the water content has no major impact on the classification model. To evaluate the ability of the selected features in this region to discriminate the origin of almonds, an additional classification model was built based on the ensemble algorithm using only the selected features in the range between 7200 and 6000 cm−1 (see Table S8 in the Supplementary Information). The test accuracy of 70.6 ± 13.9% showed that these features carry very important information for classification.

Fig. 7
figure 7

Boxplots of the signal intensity of the water signal at 5167 cm−1 from the processed and mean averaged samples of each country in the Lyo-0 h sample set

In our analysis, the features at 7194 cm−1 and 5794 cm−1 were selected by Boruta. Borràs et al. (2014) identified these signals as relevant for the differentiation between sweet and bitter almonds. Although the question of distinguishing sweet and bitter almonds was not the issue in this study, it was still relevant because almost all Iranian and Moroccan samples were bitter almonds, and almost all other samples were sweet almonds.

Conclusion

The aim of this work was to investigate how water content affects the classification results of almond geographical origin determination by NIR spectroscopy. Omitting freeze-drying resulted in a test accuracy of 93.9 ± 6.4%, which was in the same range as the samples lyophilized for 3, 24, and 48 h, with no improvements for longer lyophilization durations. Although the water content of the non-freeze-dried samples ranged from 3.0 to 10.5%, it had no effect on classification. Selected features showed that lipids and proteins were responsible for successful discrimination. This study demonstrated that it is possible to determine the geographical origin of non-lyophilized almonds by NIR spectroscopy as a rapid and cost-effective analytical method for incoming goods inspection and food fraud investigation. The results also suggest that freeze-drying may not be necessary for such analytical issues. This is particularly useful for foods with low water content and high price such as nuts and seeds.