Introduction

Hazelnut (Corylus avellana), which can be grown in certain parts of the world and is one of the traditional agricultural products of the country, has an important position among hard-shelled fruits in terms of both nutrition and human health. The Hazelnut contains 64% vegetable oil, 16.5% protein, 14% carbohydrate, rich minerals (phosphorus, iron and calcium) and vitamins (A, B1, B2, B6, C and E) as well as its cholesterol-reducing feature. According to scientific studies, 100 gr. hazelnut meets 22% of the daily protein requirement of humans and provides 634 cal of energy.

The production and export of hazelnuts contribute significantly to the economies of several countries, such as Turkey, Italy, the United States, and Spain. Hazelnuts are a valuable crop, creating employment opportunities and income for many people in these regions [1,2,3]. Besides, the need for agricultural products is increasing day by day in response to the proportionally decreasing agricultural manpower. In this direction, applications developed for the integration of this sector with technology to increase productivity in the agricultural sector, which is increasing in importance day by day, are becoming increasingly widespread.

Hazelnut classification provides important input in determining the type of hazelnut being studied and taking various actions accordingly, both during the growing process and the harvesting process. On the other hand, producing large quantities of hazelnuts can make the grading process time-consuming and complicated. As hazelnut classification is done manually, it causes various inconveniences. The classification process is marked by significant costs and sluggishness, primarily stemming from a shortage of human resources and the high expenses associated with manual labor. Furthermore, this manual approach undermines the precision of the classification process. Hence, it is essential to create alternative methods that are exceptionally precise, efficient, non-disruptive, and secure for assessing hazelnut varieties and quality [4]. Effective automation and organization systems may be required to cope with high production volumes. For these reasons, machine learning and deep learning-based artificial intelligence applications are being developed in the field of agriculture as in most areas [5,6,7].

In the literature, it is observed that deep learning-based applications have been carried out on different agricultural products [8,9,10,11,12,13]. Deep learning represents a dynamic field within machine learning research. Visual, text, signal, etc. data can be processed within deep learning methods. Visual data are processed by using Convolutional Neural Network (CNN) which is one of the deep learning methods. CNN can be imposed potential computer vision applications in automating quality control, production, and within the food industry. It facilitates the categorization of food products, allowing for sorting based on factors such as type, ripeness, quality, size, and other characteristics. This capability enables the fast and precise classification of food products, streamlining the early identification and removal of undesired or subpar items, thus optimizing production efficiency [5]. Additionally, deep learning, a subset of machine learning, holds the promise of transforming the agriculture industry through the provision of enhanced and more precise techniques for monitoring crop development, forecasting yields, and detecting plant diseases [14,15,16,17]. When the literature is investigated for hazelnut classification, we come up with many studies on machine learning and deep learning methods. These studies are listed below (Table 1).

Table 1 The related other studies for hazelnut classification in the literature

Among the other studies on hazelnut classification in the literature, Table 1, Gencturk et al. [5] aimed to classify hazelnut and developed a mobile application for classification. In the study, Ordu, Giresun and Van hazelnuts were used as dataset respectively. The numbers of these hazelnut species were created as 1324 images of Ordu, 1165 images of Giresun and 1138 images of Van, respectively. In this study, InceptionV3 + ResNet50 hybrid deep learning model was used for Hazelnut classification and 100% accuracy was obtained. Ünal and Aktaş [18] goals to classify hazelnut kernels with deep learning approach. A total of 2094 hazelnut kernel images with 3 classes were used as dataset in this study. Various deep-learning models were used to classify hazelnut kernels. Among these models, the highest accuracy rate of 98% was obtained using InceptionV3. Menesatti et al. [19] four classes of conventional Italian hazelnut species with a shape-based approach. A total of 400 hazelnuts were used as data set. The PLSDA method was used for hazelnut classification and 95.1–97.6% accuracy was obtained. In their study, Giraudo et.al [20] employed RGB image analysis to identify defective hazelnuts. They gathered a dataset comprising 2000 images, evenly split between those depicting defective hazelnuts and those showing defect-free hazelnuts. The researchers utilized PLS-DA as the classification model and achieved an impressive accuracy of approximately 97% in detecting defective hazelnuts. Keleş and Taner [21] in this study, 11 hazelnut species were classified according to their optical and mechanical properties. As classification algorithms, 89.1% and 92.7% accuracy rates were obtained with ANN and DA, respectively.

The proposed BigTransfer (BiT)-M Models has a general learning capability since it is trained on a large-scale dataset in the background. In addition, it has the ability to transfer the knowledge learned from previous tasks to new tasks by using the transfer learning concept. In addition, it provides performance improvement in tasks working with smaller and specialized datasets. As a result, since it performs well on general tasks, fine-tuning the model for specific tasks requires less data. This saves time and resources. On the other hand, the main disadvantage of the BiT model is that when we want to train the model from scratch, it requires large chunks of data and the process is time-consuming. The main contributions of the proposed paper can be listed as follows.

  • This paper focuses on the main problems in the classification of hazelnuts image. These; The use of big transfer to achieve high classification accuracy in the classification of hazelnut images from three different varieties Giresun, Ordu, and Van increasing success with deep learning.

  • Thanks to the BiT-M Models approach based on deep learning and image processing, crop productivity can be increased, and agricultural activities can be made more sustainable with high-accuracy detection and classification.

  • Consequently, this paper proposed a new approach based on big transfer deep learning for hazelnut classification.

  • The proposed BiT-M model can learn representations that are robust to transformations such as rotation and scaling. This allows us to provide accurate classification even when the agricultural images are taken from different angles.

  • The proposed BiT-M model is capable of precise classification even in high-resolution agricultural images.

  • Pre-trained representations of BiT-M can provide the possibility to train with less labeled agricultural data. This can be useful in agricultural applications with limited labeled data.

Materials and datasets were given in the second section, and the details of the proposed method were given in the third section. In the fourth section, the experimental results and observations are presented. In the fifth section, the results of the experiment were examined, and in the sixth section, the results were expressed.

Material and method

Material

In this study, we used the largest and most popular publicly available dataset of hazelnut varieties [5]. The dataset offered provides a complete picture of hazelnut cultivars (Giresun, Ordu, and Van) farmed throughout Turkey's hazelnut-rich regions. The dataset, which includes 3627 meticulously shot hazelnut photos, provides a close look at the many properties of these types. Each variety of hazelnut has a significant number of images: 1165 for Giresun, 1324 for Ordu, and 1138 for Van. The photos feature a consistent black backdrop, eliminating distractions and drawing attention to the hazelnuts. The high-resolution photos, which measure 2000 × 2000 pixels, are saved in JPG format and provide plenty of detail for analysis and modeling. Figure 1 shows some randomly selected samples of different classes of hazelnut varieties.

Fig. 1
figure 1

Images of hazelnut varieties

Deep features

Deep features refer to abstract representations of visual information derived from deep learning models, in particular, convolutional neural networks (CNNs) [26]. These features capture hierarchical and complex patterns in images, allowing machines to understand and discriminate between various elements in a scene. Unlike traditional computer vision techniques that rely on hand-crafted features, deep features are automatically learned through the layers of the network. This makes them extremely versatile and capable of understanding complex details, textures and shapes in an image. Deep features have enabled remarkable advances in areas such as image recognition, object detection and semantic segmentation, powering applications ranging from medical imaging to autonomous vehicles.

Proposed visual representation learning (BiT) method

The BiT method [27] is based on models pre-trained on a large dataset (e.g. JFT-300 M, one of the Google internal datasets). This dataset contains a very large collection of images collected from various sources. BiT uses a pre-trained base model. This model can be a large convolutional neural network (CNN). For example, widely used architectures such as ResNet, Inception can be preferred as the base model. The base model is trained to classify the images in the large data set. That is, the task is to determine to which object each image belongs. This approach is important in laying the foundations of the overall visual representation. The representation learned during pre-training is then transferred to other data sets to be used in various tasks. This means that the pre-trained representation can also be successful in previously unseen tasks. After transfer learning, the base model is fine-tuned to adapt it to specific tasks. In particular, this takes place on a more specific dataset of the target task. BiT can involve some significant changes and extras to the base model. For example, components such as larger input resolutions, additional dense layers, data augmentation techniques can be added. The success of the BiT method is determined by measuring the performance in transfer learning. It is taken into account how the pre-trained model performs on various tasks.

The BiT method shows that a pre-trained representation on a large data set can be suitable for transfer learning on a variety of tasks. This emphasizes the importance of general representation learning and shows that the use of particularly large datasets can have an impact on success. Figure 2 shows the BiT model structure.

Fig. 2
figure 2

Batch normalization vs group normalization and weight standardization

.

In Fig. 2, the upstream components are used during pre-training and the downstream components are used when fine-tuning a new task. The first component in this structure is scale. In deep learning, it is known that networks generally perform better as they increase in size. Moreover, it is accepted that large datasets are better analyzed on large architectures. The BiT model is a model that replaces BN (Batch Normalization) with GN (Group Normalization) + WS (Weight Standardization) layers. BN is disadvantageous for Large Transfer due to two reasons. First, when training large models with small groups per device, BN performs poorly. In this case, the cost of synchronization between devices may also arise. Secondly, the BN may have a negative impact on the transfer due to the need to update the statistics carried out. The following Fig. 3 demonstrates the Normalization Techniques and Weight Standardization layers.

Fig. 3
figure 3

Representation of normalization techniques and weight standardization

Combining GN and WS layers has also been proven to increase small-batch training performance for COCO and ImageNet [28]. While combining these layers enhances Batch Normalization for larger batches, it also has a major impact on transfer learning.

Correcting the training-test resolution difference reveals that increasing the resolution by a minor factor during test time is typical. Alternatively, a step where the trained model is fine-tuned to the test resolution might be introduced [29]. The latter is ideal for transfer learning since the resolution change is integrated into the fine-tuning stage. Sufficient regularization is performed by defining an adequate program length, i.e. longer training runs for larger datasets.

In this work, the architectures BiT-M R50 × 1, BiT-M R101 × 3, and BiT-M R152 × 4 are employed. While R50 × 1 is based on the standard ResNet50-v2 architecture, R101 × 3 is three times wider than the standard ResNet101-v2 architecture, and R152 × 4 is four times wider than the standard ResNet152-v2 architecture. The Group Normalization layer in the BiT model replaces the Batch Normalization layer in the original architectures, as does the Weight Standardization layer in the convolutional layers. All of these architectures were trained on multiclass data using ImageNet-21 k. ImageNet-21 k has 14 million photos divided into 21,843 different types. Every design generates a 2048-dimensional output. In other words, each image has 2048 features calculated. The overall proposed method is demonstrated in the following Fig. 4.

Fig. 4
figure 4

Overall proposed method

Performance metrics

Performance metrics serve as essential benchmarks in evaluating the efficacy of deep learning models and play a pivotal role in informed decision-making processes. These metrics are indispensable across various tasks, including model evaluation, optimization guidance, result reporting, error and bias identification, reference point establishment, and overfitting detection. Within the realm of deep learning, accuracy, sensitivity, precision, and F1-score are commonly employed metrics [30,31,32,33,34].

$$\mathbf{A}\mathbf{c}\mathbf{c}\mathbf{u}\mathbf{r}\mathbf{a}\mathbf{c}\mathbf{y}=\frac{\mathbf{T}\mathbf{N}+\mathbf{T}\mathbf{P}}{\mathbf{T}\mathbf{N}+\mathbf{F}\mathbf{P}+\mathbf{T}\mathbf{P}+\mathbf{F}\mathbf{N}}$$
(1)
$$\mathbf{P}\mathbf{r}\mathbf{e}\mathbf{c}\mathbf{i}\mathbf{s}\mathbf{i}\mathbf{o}\mathbf{n}=\frac{\mathbf{T}\mathbf{P}}{\mathbf{T}\mathbf{P}+\mathbf{F}\mathbf{P}}$$
(2)
$$\mathbf{S}\mathbf{e}\mathbf{n}\mathbf{s}\mathbf{i}\mathbf{t}\mathbf{i}\mathbf{v}\mathbf{i}\mathbf{t}\mathbf{y}=\frac{\mathbf{T}\mathbf{P}}{\mathbf{T}\mathbf{P}+\mathbf{F}\mathbf{N}}$$
(3)
$${\varvec{F}}1-\mathbf{s}\mathbf{c}\mathbf{o}\mathbf{r}\mathbf{e}=2\boldsymbol{*}\frac{\mathbf{P}\mathbf{r}\mathbf{e}\mathbf{c}\mathbf{i}\mathbf{s}\mathbf{i}\mathbf{o}\mathbf{n}\mathbf{*}\mathbf{s}\mathbf{e}\mathbf{n}\mathbf{s}\mathbf{i}\mathbf{t}\mathbf{i}\mathbf{v}\mathbf{i}\mathbf{t}\mathbf{y}}{\mathbf{P}\mathbf{r}\mathbf{e}\mathbf{c}\mathbf{i}\mathbf{s}\mathbf{i}\mathbf{o}\mathbf{n}+\mathbf{s}\mathbf{e}\mathbf{n}\mathbf{s}\mathbf{i}\mathbf{t}\mathbf{i}\mathbf{v}\mathbf{i}\mathbf{t}\mathbf{y}}$$
(4)

Experimental results

In the experiments, BiT models were trained and tested based on pre-trained ResNet models. The experiments have been conducted in two phases with two different configurations. In the first phase, the entire dataset has been used, while in the second phase, only 1/10 of the dataset has been used. This process is implemented to demonstrate the potential performance of the proposed BiT architecture on small training sets.

In this study, CNN approaches BiT models were used for the classification of hazelnut varieties. BiT models, BiT-M R50 × 1, BiT-M R101 × 3 and BiT-M R152 × 4 were used in the experimental evaluations. BiT models were implemented using ImageNet pre-trained weights for all models to ensure faster convergence and better accuracy. For more robust results, the experiments were repeated 5 times. CNN techniques, one of the most extensively used deep learning architectures, are widely employed in agricultural product analysis. The hazelnut varieties dataset was used to train and evaluate state-of-the-art CNN algorithms from the literature as well as the most recent CNN architectures. In summary, the performance of the models was evaluated using default settings at a shared reference point.

Configuration-1

In this configuration, entire data has been used in both training and testing phases. Data-related values for the relevant configuration are presented in Table 2 below.

Table 2 Configuration-1 for hazelnut dataset

The confusion matrices obtained according to the experimental results are presented below.

In the context of hazelnut species classification using BiT models, the confusion matrix provided shows the classification performance of the model on different hazelnut species, namely Giresun, Ordu and Van in Fig. 4. When the confusion matrices are analyzed, it is observed that the best performance result is obtained with the BiT-M R152 × 4 model. BiT-M R152 × 4 model matrix, the class Giresun has been correctly predicted 175 times, with no misclassifications. The model performs well, particularly in accurately differentiating between the classes Giresun, Ordu, and Van, with few misclassifications.

The experimental evaluation results of the sensitivity, precision, accuracy and F1-score criteria of the BiT-M R50 × 1, BiT-M R101 × 3 and BiT-M R152 × 4 models used for hazelnut variety classification are presented in Table 3.

Table 3 Classification of hazelnut varieties performance results

Table 3 examination, it is clear that the BiT-M R50 × 1, BiT-M R101 × 3, and BiT-M R152 × 4 models exhibit exceptional accuracy, precision, and F1-score values, demonstrating their efficacy in identifying hazelnut varieties. The BiT-M R152 × 4 model stands out, with an extraordinary accuracy rate of 0.9982 and a perfect precision score of 1.0000, suggesting minimum misclassifications. Furthermore, all models have high sensitivity, correctly recognizing positive cases. These findings highlight the CNN models' dependability and robustness, stressing their capacity to properly categorize hazelnut types.

Configuration-2

In this configuration, only 1/10 of the data was used in both training and testing phases. Data-related values for the relevant configuration are presented in Table 4 below. Unlike the initial configuration, training is set to 70% and testing to 30%.

Table 4 Configuration-2 for hazelnut dataset

The confusion matrices obtained according to the experimental results are presented below.

In the context of hazelnut varieties classification using BiT models, the confusion matrix provided shows the classification performance of the model on different hazelnut species, namely Giresun, Ordu and Van in Fig. 5. When the confusion matrices are analyzed, it is observed that the best performance result is obtained with the BiT-M R152 × 4 model. The model performs well, particularly in accurately differentiating between the classes Giresun, Ordu, and Van, with few misclassifications.

Fig. 5
figure 5

The confusion matrix of BiT models classification results for configuration-1

The experimental evaluation results of the sensitivity, precision, accuracy and F1-score criteria of the BiT-M R50 × 1, BiT-M R101 × 3 and BiT-M R152 × 4 models used for hazelnut variety classification are presented in Table 5.

Table 5 Classification of hazelnut varieties performance results

Table 5 presents the classification performance results for hazelnut variety recognition using three distinct models. The 'BiT-M R50 × 1' model achieved a notable accuracy of 0.9630, coupled with precision and sensitivity rates of 0.9706 and 0.9429, respectively, resulting in a robust F1-score of 0.9565. The 'BiT-M R101 × 3' model exhibited a slight performance improvement, with an accuracy of 97.22%, precision at 0.9750, sensitivity of 0.9512, and an F1-score of 0.9630. Most notably, the 'BiT-M R152 × 4' model showcased outstanding accuracy at 0.9907, perfect precision (1.000), a sensitivity rate of 0.9714, and an impressive F1-Score of 0.9855. High accuracy values demonstrate that the proposed model has high classification capacity. A high accuracy value means that other performance metrics are also significantly higher as well. High precision means positive prediction capacity and high sensitivity means remarkable true positive rates. On the other hand, high F1-Score values demonstrate that the proposed model has a good balance between precision and sensitivity, meaning the number of false positives and false negatives is low. These overall results underscore the exceptional classification capabilities of the 'BiT-M R152 × 4' model in hazelnut variety classification, solidifying its position as the top-performing choice for this task (Table 6).

Table 6 Comparison performance of suggested technique and other studies

Discussion

Compared to other methods in the literature, the proposed method achieves this success with a single base model. There is no external enhancement, preprocessing, feature fusion or hybridization. BiT utilizes pre-trained models to learn general features from extended networks. The method performs well even on smaller datasets. Through experiments with two distinct configurations of the proposed BiT model, it has been demonstrated that the model consistently achieves high accuracy, even when the dataset is reduced to just 1/10 of its original size. This remarkable performance underlines the model's robustness and efficiency in handling data constraints while maintaining accurate results. On the other hand, the proposed method provides reduced training time. It provides a cost advantage compared to other models while achieving high-performance levels. The following table presents a comparison between the proposed method and some prominent works in the literature (Fig. 6).

Fig. 6
figure 6

The confusion matrix of BiT models classification results for configuration-2

The given Table 6 provides a comprehensive comparison of multiple classification methods employed in diverse research studies, highlighting the proposed method. In this context, the proposed method emerges as a robust and effective approach for classifying hazelnut varieties. Proposed BiT model technique on a substantial dataset of 3627 images across three classes, the proposed method achieves an impressive accuracy rate of 99.82%. This success is remarkable, especially when the number of classes and the number of images are considered. The use of Big Transfer seems to be a powerful method that can be preferred especially in applications working with large data sets and tasks requiring high accuracy. However, the generalisability and applicability of these results may require further evaluation and comparison in the context of the specific application. The proposed method's robust performance underscores its efficacy and reliability in accurately categorizing hazelnut varieties, making it a noteworthy advancement in the field. Comparatively, the proposed method performs remarkable accuracy, showcasing its competitiveness and potential for practical applications in agricultural research and quality control. Limitations in the proposed method may affect the overall usability and performance of pre-trained models. In an application context, these limitations should be taken into account and the suitability of a pre-trained model should be assessed. Training the proposed BIT-M model requires significant computational resources, including high-performance GPUs or TPUs. This can be a limiting factor for researchers or organizations with limited access to such resources.

Conclusions

This study explores the use of deep learning algorithms to transform hazelnut classification, aiming to decrease the need for manual labor, reduce time, and minimize costs associated with the sorting process. The experiment specifically targeted three hazelnut varieties: Giresun, Ordu, and Van. The dataset used for the experiment included 1165 images for Giresun hazelnuts, 1324 images for Ordu hazelnuts, and 1138 images for Van hazelnuts. Deep learning models, particularly the renowned Big Transfer (BiT) models, were utilized for classification purposes. The classification task involved a total of 3627 hazelnut images. The BiT-M R152 × 4 model exhibited remarkable accuracy, achieving an impressive 99.49%. These findings underscore the potential of deep learning techniques in reshaping hazelnut classification. This not only improves efficiency and sustainability within the agricultural industry but also fosters the development of innovative and patentable products and devices. As a result, it contributes to the economic growth of the country. Training the proposed BIT-M model requires significant computational resources, including high-performance GPUs or TPUs. This can be a limiting factor for researchers or organizations with limited access to such resources. The large number of parameters and some segmentations that could be further optimized are drawbacks that we plan to improve for the BIT-M model in the future.