Introduction

The development of artificial intelligence has brought great convenience to our daily lives. In the agricultural field, it is increasingly being used in planting, cultivation, pest control, harvesting, post-harvest product classification, and other areas [1].

In the area of agricultural robotics, automatic fruit classification and sorting is a crucial application. To distinguish between distinct fruit types for fruit identification and picking with the help of robotic platforms during harvest, automatic fruit select systems can be used. They are also applicable to product selection, supermarkets, price setting for post-harvest quality evaluation, and quick invoicing for the packaging sector. In order to increase fruit quality and output, efforts are being made to replace the manual processes of picking and sorting fruit with automated systems that use both machine vision and machine learning techniques.

The homeland of cherry (Prunus avium L.), one of the most preferred fruits in the world, is known as the region between the South Caucasus, the Caspian Sea, and Northeastern Anatolia [2]. Cherry, a member of the Rosaceae family, is an industrial plant consumed as fresh fruit, fruit juice, wine, jam, preserves, compotes, and dried fruit, the image of which is given in Fig. 1. Cherry cultivation in the world is generally carried out in 35º–55º north and south latitudes, but unlike these latitudes, it is also carried out in areas where the required temperature and other factors are suitable [3]. When the cherry exports are examined according to the data of 2022 by countries, the first three countries that export the most cherries are Chile with 274 thousand tons, Hong Kong with 165 thousand tons, and Turkey with 87 thousand tons. The most important cherry variety grown in Turkey is Ziraat900, also known as “Turkish Cherry.” In addition, Van, Bing, Early Burlat, Regina, Lambert, and Sweetheart varieties are also available [4]. Some physical, utilization, and product characteristics of cherry fruit are given in Table 1.

Fig. 1
figure 1

Uluborlu cherry orchard

Table 1 Physical, usage, and product properties of cherry fruit

Thanks to its unique aroma, high nutritional value, high pH value, and flamboyant fruits, cherry is one of the most consumed fruits in the world, apart from a few varieties used in industry. 26% of the world’s cherry production is carried out in Turkey, of which only 10% can be exported [5]. According to these data, although Turkey's cherry production area is the leader in the world in terms of cherry production, it ranks third or fourth in terms of exports, depending on the years. One of the reasons for this is the difficulty in supplying quality cherries demanded by exporters. The most important problem of cherry orchards is low yield and irregularity. Although there are many different reasons for this situation, the most important reason is that the correct pollinator selection is not made in the garden facility and not planting enough pollinator varieties. Pollination and fertilization are the most important issues in establishing a cherry orchard. With the exception of a few varieties, cherries are self-incompatible and cannot self-fertilize [6]. Table 2 lists the main cherry varieties and their pollinators. If there is no pollinator variety in the orchard, usually 3–5 out of every 100 flowers turn into fruit. In economic terms, 25 of every 100 flowers can be considered sufficient for fruiting. In cherry cultivation, 25–40% is good yield and more than 40% fruit setting is excellent. Fruit set can be 70–80% under favorable conditions.

Table 2 Main cherry varieties and pollinators

It is an economic necessity to plant cherries that pollinate each other in cherry production areas. At the time of cherry fruit collection, agricultural laborers cannot distinguish between cherry species, so they collect different cherry species in the same place during collection. Cherry species have different characteristics such as price, storage, and consumption times. For this reason, especially in cherry processing and packaging facilities, there is a need for information technologies that enable cherry species to be separated from each other with high accuracy.

Because of this, classification procedures are crucial for ensuring that the product meets quality standards and that consumers receive fruits of the desired caliber. Currently, employees typically grade fruit manually and based on their own experience. The fruit packing industry faces many problems often related to workers, such as increased manual labor costs, product wastage due to incorrect grading, and other human errors. In addition to the heterogeneity of hand selection, this method takes a lot of time and money. To speed up classification and classification tasks, as well as to boost productivity and accuracy, a high performance machinery system is needed. This is because the rate of sales of the product in the market is determined by the efficacy and efficiency of monitoring and quality control processes, as well as by reducing the costs of processing agricultural products [7].

In order to provide such a system, although there are various studies on different fruits in the literature, there is no CNN-based experimental study on cherry species grown in Turkey.

The main contributions of this research to the literature are as follows:

  1. 1.

    A new fruit dataset called ISP7Cherry has been created for cherry, one of the most important economic products of Turkey, and scientists working in this field.

  2. 2.

    In order to avoid the limitations of a single classifier to classify seven different cherry varieties, six CNN models (DenseNet169, MobileNet, VGG19, InceptionV3, ResNet152, and NASNet) were used for feature extraction from images.

  3. 3.

    In order to obtain the best classification performance, the Maximum Voting method (ensemble learning) of the two neural networks with the highest results was used.

  4. 4.

    Seven different cherry varieties were classified with a high success rate.

  5. 5.

    The proposed system provided fast, reliable, and robust classification of cherry varieties with less equipment instead of using special and costly imaging devices.

In the second part of the article, studies in the field of agriculture for different fruit types and cherries are presented. In the third part of the paper, the dataset used in this study, the properties of deep learning networks and CNN algorithms, ensemble learning, and performance evaluation criteria are reported. In the fourth section, the experimental environments, the results obtained from the algorithms used in the study are compared, the two neural networks with the best performance are brought together and the model that gives the best result according to the Maximum Voting method is found. In the fifth section, the results of this study and information about future studies are given.

Literature review

Automated fruit classification is a compelling issue within the realms of fruit cultivation and the retail chain. It holds great potential for fruit growers and supermarkets, as it enables them to identify various types of fruits and assess their condition within stock or containers. By doing so, it enhances production efficiency, ultimately leading to increased commercial profit. As a result, there has been significant research and exploration in the past decade into intelligent systems that leverage computer vision and machine learning techniques for tasks such as fruit defect identification, ripeness grading, and classification.

In the domain of automatic fruit classification, there are two main approaches: traditional computer vision-based methods and deep learning-based methods. Traditional methods involve extracting low-level features from the images and utilizing traditional machine learning techniques for image classification. On the other hand, deep learning methods excel at automatically extracting features and performing end-to-end image classification. For instance, Joseph et al. [8] demonstrated the effectiveness of deep learning-based methods in fruit classification. In traditional computer vision and image processing techniques, various image features such as texture, color, and shape are used as input for fruit classification. For example, Muhammad [9] proposed an automatic classification system for "Date Palm" fruit using handcrafted features. A classification accuracy of 98% was reported with the Support Vector Machine (SVM) classifier. Similarly, Kheiralipour and Pormah [10] developed a cucumber shape classification system based on shape features including area, perimeter, eccentricity, extent, roundness, compressions, width-non-homogeneity, and center of gravity-non-homogeneity. By training a neural network (NN) to classify cucumbers into desired and undesired types, they achieved an impressive classification accuracy of 97.10%. These studies highlight the efficacy of both traditional and deep learning-based approaches in automated fruit classification, paving the way for further advancements in this field.

Fruit identification and classification have emerged as one of the latest trends in the field of computer vision. This area of computer science has gained significant attention due to its potential applications. The accuracy of a fruit classification system depends on several factors, including the quality of the fruit images collected, the number and types of features extracted, the selection of optimal classification features, and the choice of classifier utilized. In a study conducted by Shahi et al.[11], they employed a novel attention convolution module based on MobileNetV2 for fruit classification using three different datasets. These datasets consisted of fruits and vegetables with 15 classes, Indian fruits with 12 classes, and fruits and vegetables with 53 classes. Impressively, the classification accuracies achieved were 95.75%, 96.74%, and 96.23% for each dataset, respectively. De Goma et al. [12] focused on extracting color, size, shape, and texture features from fruit images belonging to fifteen classes. They compared the performance of four different classifiers, namely, K-Nearest Neighbors (KNN), Naïve Bayes, Decision Trees, and Bagging classifier. The KNN classifier yielded the best results with an accuracy of 81.9%. These studies demonstrate the ongoing advancements in fruit identification and classification, with researchers exploring various techniques and classifiers to improve accuracy and performance in this domain.

In recent years, deep learning architectures have gained significant traction for the successful detection of various fruit types. For instance, Xiong et al.[13] employed the YOLOv2 model and images captured from a UAV to detect mangoes in orchards. Wang et al. [14] utilized a variant of the R-CNN Mask network to identify and segment waxberry berries. Kang and Chen [15] focused on fruit detection using deep neural networks, aiming to generate a 3D visualization to guide a harvesting robot. Their approach enabled precise navigation and targeting. In the realm of fruit classification, Katarzyna and Paweł [16] proposed a 9-layer deep neural network capable of accurately classifying six different apple varieties with an impressive accuracy rate of 99.78%. Nosseir and Ahmed [17] developed an algorithm utilizing color and texture features for fruit type identification, along with a KNN classifier, and employed an SVM classifier for distinguishing between rotten and fresh fruit. Furthermore, several studies have addressed the detection of multiple fruit species. Sa et al.[18] utilized RGB and NIR images of six different fruits, employing the fastest R-CNN model for detection. Vasconez et al.[19] performed a comparative study using the Faster R-CNN model with InceptionV2 and Single Shot Multibox Detector (SSD) with MobileNet for the detection and counting of three fruit species. Their findings demonstrated that the Faster R-CNN architecture yielded superior accuracy and fruit counting performance compared to SSD. Additionally, Macanhã et al.[20] conducted a classification task involving 15 fruit classes, employing Multilayer Perceptron (MLP) and KNN as classifiers. They achieved an impressive accuracy rate of 97.5% using the best-performing feature identification methods. Bal and Kayaalp [21] extracted the features of the images with Convolutional Neural Network (CNN) architecture to predict the yield of apple fruit and then performed the classification process with machine and ensemble learning methods. Kang and Gwak [22] used ensemble learning based on ResNet50 and ResNet101 pre-trained CNN models to distinguish between fresh and rotten fruits. These studies collectively showcase the successful implementation of deep learning architectures for fruit detection, identification, segmentation, and classification, with significant advancements in accuracy and performance.

When reviewing the existing literature on cherry fruit, it becomes evident that it has been extensively utilized in various domains, including classification, ripeness grading, and harvest estimation. Seeking to establish an alternative grading method for cherries that supersedes traditional approaches, Momeny et al. [23] proposed a CNN algorithm that incorporates a hybrid pooling technique. This algorithm aimed to enhance the classification of cherries, improve their exportability, and detect visual aspects associated with cherries. To evaluate the performance of their method, they compared it with other machine learning approaches such as KNN and ANN, as well as traditional feature extraction methods like LBP and HOG. The results demonstrated the superior success of the CNN approach, achieving an impressive accuracy rate of 99.4%. The utilization of deep learning-based CNN methods for successful cherry detection and classification has gained prominence due to their inherent advantages, including the elimination of feature extraction requirements and the ability to discern intricate details that may even elude human perception. In another study conducted by Villacrés and Cheein [24], RGB images obtained from a cherry field in Chile were processed using the Faster R-CNN framework to estimate the harvest. Their approach yielded approximately 85% accuracy in harvest estimation. Overall, these studies highlight the growing adoption of advanced techniques, such as CNN and Faster R-CNN, to enhance the detection, classification, and estimation processes related to cherries, leading to improved efficiency and accuracy in cherry-related applications.

Gai et al. [25] undertook the development of a deep learning algorithm for cherry fruit detection utilizing an enhanced version of the YOLOv4 model. In order to optimize the feature extraction and detection speed of the network, they replaced CSPDarknet53 in the YOLOv4 backbone network with DenseNet for small-volume cherry fruits. Furthermore, the activation function was modified to Leaky ReLU. The study focused on detecting cherries within three distinct classes: ripe, semi-ripe, and unripe fruits. The algorithm achieved an impressive F1-Score of 94.7%. Another study conducted by Ropelewska et al. [26] aimed to develop discriminative models for cherry classification based on texture parameters derived from image analysis of three different cherry varieties, namely "Büttner's Red," "Kordia," and "Lapins." The images were converted into R, G, B, L, a, b, X, Y, and Z color channels. To perform the classification, the researchers employed various methods including Naive Bayes, Multilayer Perceptron, Multi-class classifier, and Random Forest. Reyes et al. [27] converted the images from a dataset of 63 images of five different classes grown in Chile into RGB and HSV color space. They used Bayesian classification method to classify 36 test images with 100% success rate. Li et al. [28] created a dataset of 36,068 samples for the classification of immature, semi-ripe, and ripe sweet cherries. With the classification model named YOLOX-EIoU-CBAM, they classified the degree of ripeness of cherries with an accuracy rate of 84.96%. Gai et al. [29] developed the YOLOv5s cherry model to classify four different ripeness states of cherries based on the YOLOv5s model. Using 4416 images in their dataset, they classified cherries as sparse immature cherry, sparse mature cherry, dense immature cherry, and dense mature cherry with an accuracy of 96.77%. These studies demonstrate the application of advanced techniques such as enhanced YOLOv4 models and texture-based classification methods to improve cherry fruit detection and classification, resulting in high accuracy rates and enhanced performance. Studies in the literature on the fruit and vegetable classification are given in Table 3.

Table 3 Fruit and vegetable classification literature review

When the literature is examined, it is seen that studies based on deep learning methods have been carried out to distinguish many fruit species from each other or to classify different species belonging to the same fruit. In the studies on cherries, it is seen that harvest estimation, determination of ripeness degree and classification of a small number of species are performed. However, this study aims to contribute as one of the first studies in the literature with the ensemble learning method based on CNN models for the classification of cherry species grown in Isparta province.

Material and methods

Image datasets

Images in the dataset created for this study were obtained from different cherry orchards located in Isparta province Uluborlu Center district, Küçükkabaca and Dereköy villages, Senirkent Center district, Büyükkabaca, Gençali, and Yassıören villages between June and July 2022 and June and July 2023. Images were obtained in RGB format with 3456 × 3456 pixel dimensions, 72 dpi horizontal and vertical resolution, and 24-bit depth, with a Samsung SM-M317F mobile phone in a controlled environment. The images were acquired from a distance of 50 cm on an illuminated white background. In the created dataset, images of Alyanak, Lambert, North Wonder, Regina, Stella, Sweetheart, and Ziraat900 fruits are shown in Fig. 2.

Fig. 2
figure 2

Images of cherry varieties in the dataset a Alyanak, b Lambert, c North Wonder, d Regina, e Stella, f Sweetheart, g Ziraat900

Detection model architecture

In the experimental study for deep learning-based classification of cherry varieties, a dataset was created from seven different fruit images (Alyanak, Lambert, North Wonder, Regina, Stella, Sweetheart, Ziraat900). In order to perform a better classification study from the images in the dataset, data augmentation was performed on the images with different methods. After these processes, the dataset was divided into three parts as training, validation, and testing in order to apply six different deep learning algorithms. The system architecture used in this study is presented in Fig. 3. The models obtained as a result of training were given fruit images that they had never seen before and the classification success of the models was measured. Successful models were further improved by ensemble learning method.

Fig. 3
figure 3

Block diagram of the proposed system

The images in the dataset were preprocessed from 3456 × 3456 pixel size to 224 × 224 × 3 size. The 3570 images in the dataset were randomly selected and first divided into training (80%) and test (20%) datasets as shown in Fig. 4. The data allocated for the training set are divided into training (80%) and validation (20%). In order to train the models with more data, data replication method was applied. Five different methods were used in the data augmentation process. These included rotation of the images up to 30 degrees (rotation_range = 30), horizontal and vertical shifts of 25% of the image size (width_shift_range = 0.25, height_shift_range = 0.25), zooming of the images in a range of 15% (zoom_range = 0.15), and random horizontal mirroring of the images (horizontal_flip = True). Thus, a total of 11,424 images were obtained for the training dataset, 2856 images for validation, and 3570 images for testing. The train, validation, and test numbers of the images in the original and augmented dataset used in this study are shown in Table 4.

Fig. 4
figure 4

Training, validation, and test data parts for transfer learning methods

Table 4 Number of cherry varieties in the original and augmented dataset

Deep neural networks

Deep learning models are used for object detection and classification processes from images in different research areas such as transportation, defense industry, and agriculture [30]. It is used in many researches such as predicting traffic density in transportation, labeling military vehicles in the defense industry, and detecting plant diseases in agriculture. Deep learning algorithms owe their successful results in many areas such as prediction, classification, speech recognition, visual object recognition, and object detection to the hidden layers in their content. Some of the Deep Learning algorithms used for these purposes are CNN, Autoencoder (AE), Restricted Boltzmann Machine (RBM), Deep Belief Networks (DBN), and Gated Recurrent Unit (GRU). Since CNN algorithms are successful in classification problems, they are preferred by researchers in research on this subject. In this experimental study, six popular architectures were used to classify cherry species grown in Isparta province: DenseNet169, MobileNet, VGG19, InceptionV3, ResNet152, and NASNet. Brief information about the CNN algorithms used in this study is given.

DenseNet

The Dense Convolutional Network (DenseNet) garnered significant recognition, receiving the best paper award at the 2017 Conference on Computer Vision and Pattern Recognition (CVPR), and accumulating over 2000 citations. This network architecture exhibits a unique characteristic wherein each layer is directly connected to every other layer in a feed-forward manner. In the case of DenseNet, every layer treats the feature maps from all preceding layers as distinct inputs, while its own feature maps are shared as inputs with all subsequent layers. Notably, DenseNet achieves comparable accuracy to ResNet on the expansive ImageNet dataset, despite utilizing approximately half the number of parameters. Multiple models of DenseNet have been developed, employing varying configurations of Dense Blocks, such as DenseNet-121, DenseNet-169, DenseNet-201, and DenseNet-264. This pioneering architecture has significantly impacted the field of deep learning and has demonstrated its efficacy in various image classification tasks [31].

MobileNet

Traditional CNN has demonstrated remarkable performance in image recognition tasks. However, they demand substantial computational resources for both training and testing, necessitating powerful GPUs and significant memory capacity. To address this issue, a solution known as MobileNet was introduced in 2017 as an alternative to conventional convolutional filters. MobileNet is specifically designed as a lightweight deep neural network, with the primary objective of enabling implementation on mobile and embedded devices that have inherent computational limitations. Google played a significant role in proposing and developing MobileNet, recognizing the need for efficient models tailored to mobile platforms. The introduction of MobileNet has paved the way for deploying deep learning algorithms on resource-constrained devices, expanding the accessibility and applicability of image recognition technology [32].

VGG19

VGG-19 is a 19-layer version of the Visual Geometry Group (VGG) architecture. The VGG-19 architecture includes similar layers that share weights and use the same weights. This speeds up the model learning process and reduces the number of parameters. This architecture uses convolutional filters of size 3 × 3 and usually contains 5 convolutional layers with filter numbers of 64, 128, 256, 512, and 512, respectively. Furthermore, the VGG-19 architecture includes 3 fully connected layers and the softmax function is used for multiclassification between these layers [33].

ResNet

ResNet is a type of neural network introduced in 2015 by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun in their paper "Deep Residual Learning for Image Recognition" to facilitate the training of significantly deeper networks and won first place in the ILSVRC 2015 classification task with an error rate of 3.57%. In order to solve the distortion/optimization problems encountered in the optimization of deep models, the ResNet architecture was developed, which uses a 34-layer flat network architecture with fewer filters and lower complexity than VGG networks based on the VGG-19 architecture. The architecture was then transformed into a residual network by adding jump links or residual blocks to this flat network [34].

InceptionV3

Inception V3, which won the ILSVRC 2014 image classification competition, was introduced to the literature by Szegedy et al. The model, which has 299 × 299 pixels of input data, has 48 layers and approximately more than 23 million parameters. Unlike previous inception models, this model uses 1 × 1 and 3 × 3 convolutions instead of 5 × 5 convolutions to reduce the time cost [35].

NASNet

Neural Architecture Search (NAS) was developed by Google Brain and classified ImageNet 1.2% better than previously published studies in 2017. The NAS algorithm, which consists of three components, Search Space, Search Strategy, and Performance Estimation Strategy, is scalable CNN architectures and consists of separable convolution and reduction blocks with reinforcement learning method [36]. Furthermore, NAS algorithms can be classified into three different categories: evolutionary computation (EC), gradient, and reinforcement learning (RL) [37].

Ensemble learning

Ensemble learning is a machine learning technique that has achieved great success in various artificial intelligence applications using multiple base learners to form an ensemble learner to achieve better generalization of the learning system. Ensemble learning strategies are more successful than the classical single learning approach in terms of better classification results. The main premise of ensemble learning is that by combining multiple models, the errors of a single trigger will likely be compensated by other causes, and as a result, the overall prediction performance of the ensemble will be better than that of a single cause. The reasons why ensemble methods often improve prediction performance are Overfitting avoidance, Computational advantage, and Representation. In addition, ensemble methods are used to reduce the difficulties of machine learning algorithms such as Class imbalance, Concept drift, and Curse of dimensionality [38].

It is possible to divide ensemble learning into two as simple and advanced. Maximum Voting, Averaging, and Weighted Averaging methods can be classified as simple, while Stacking, Bagging, and Boosting methods can be classified as advanced ensemble learning. In the Maximum Voting method shown in Fig. 5 in this study, multiple models are used to make predictions for each data point. The predictions of each model are considered as a “vote.” The predictions obtained from the majority of the models are used as the final prediction [39].

Fig. 5
figure 5

The Maximum Voting ensemble method

Each classifier in the ensemble predicts a class label when a new data point arrives. The class label predicted by the majority of classifiers, or the class label with the most votes, is the label used for this incoming data. The mathematical method used for Maximum Voting ensemble learning is given in Eq. (1) [40]:

$$\tilde{Z} = {\text{argmax}}\left( {X_1 \left( {z_i^1 } \right), X_2 \left( {z_i^2 } \right), \ldots , X_i \left( {z_i^n } \right) } \right).$$
(1)

Performance evaluation

The metrics given in Eqs. (25) were used to calculate the performance of the seven deep learning models used in this study. Accuracy is the value of correctly classified instances within the classified sample data. Precision is the value of how many of the positively classified sample data are actually positive. Recall is the value of how much of the sample data that should be predicted as positive is actually classified as positive. F1-Score is the harmonic mean of the precision and recall metrics [41].

$${\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}},$$
(2)
$${\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}},$$
(3)
$${\text{Recall}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}},$$
(4)
$$F1{\text{ - Score}} = 2 \times \left( {\frac{{{\text{Precision}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}}} \right).$$
(5)

The number of samples that were properly classified as positive is known as True Positive (TP), and the number of samples that were correctly classified as negative is known as True Negative (TN). False-Negative (FN) samples are those that are incorrectly categorized as negative, while False-Positive (FP) samples are those that are incorrectly classified as positive.

Result and discussion

Experimental procedure

Experimental studies were carried out on a virtual machine in a cloud environment. Tesla T4 GPU-CUDA 11.4 and Tensorflow 2.4.1 deep learning library and Python programming language were used. In the experimental study where six deep learning models were used to classify Isparta cherry varieties, the hyper parameters used for model training are given in Table 5. In the experimental study, 224 × 224 × 3 images were transferred to the inputs of the deep learning models.

Table 5 Training hyperparameters

As shown in Fig. 6, in the training phase, transfer learning was performed by freezing the base layers of the pre-trained models. With the fine-tuning process, global_average_pooling, dropout, dense, batch_normalization, dropout and dense layers were added to the base model, respectively. The class value of the output layer of the models was set according to the cherry class number and the activation function was set as softmax. ReLU activation function was used in the intermediate layers. Adam and SGD optimizer were tested and the best values were obtained with Adam optimizer. Categorical Cross Entropy was used as the loss function. Accuracy was chosen as the metric parameter. Different epoch values were tested. To update the learning rate (lr) and to prevent overlearning, reduce_lr_loss, early_stop, and model_check_point controls were added to the model. By training the models, training and validation accuracy were obtained and the confusion matrix was used to obtain the test data and classification performance. Based on the confusion matrix, precision, recall and F1-Score metrics were obtained.

Fig. 6
figure 6

The architecture of our transfer learning model

Comparison of the performance of transfer learning methods

With the dataset created for this study, InceptionV3, DenseNet, ResNet152, VGG19, MobileNet, and NASNet deep learning models were trained both with and without data augmentation. The training result, loss, and accuracy graphs performed with incremental data are given in Fig. 7.

Fig. 7
figure 7

Loss and accuracy graphs of deep learning models after data augmentation

In the training of the models, the training is automatically terminated in cases where the validation loss rate cannot be improved, since the learning rate (lr) is set to the model with the reduce_lr_loss and early_stop parameters. For this reason, since the highest number of revolutions (epoch) determined as the model parameter has not been reached, the number of revolutions in the graphics differs. A number of epochs where the highest validation and highest train accuracy rates of the models are obtained are shown in Table 6.

Table 6 Maximum validation, train accuracy rates, and epoch numbers of the models

The lowest validation loss in training models with augmentation data (0.140 with InceptionV3, 0.093 with ResNet152, 0.056 with VGG19, 0.045 with MobileNet, 0.035 with NASNet, and 0.167 with DenseNet169) and the highest validation accuracy in training models with incremental data (values of 0.938 with DenseNet169, 0.954 with InceptionV3, 0.972 with VGG19, 0.987 with ResNet152, 0.988 with NASNet, and 0.992 with MobileNet) were obtained.

In order to determine the model with the best performance, the values obtained from the precision, recall, and F1-Score metrics obtained from the test results are presented below, respectively. According to the precision values obtained from the test results of the models, it is seen that the best four values are obtained with the DenseNet169 model. It is seen that a cherry variety is classified relatively well with MobileNet, VGG19, and ResNet152 models. After the DenseNet169 model, it is seen that MobileNet and VGG19 models perform a successful classification, respectively. The InceptionV3 and NASNet models achieved the lowest average precision compared to the best model DenseNet169 with a difference of approximately 5%. Among the cherry varieties, the North Wonder class has the highest average precision.

According to the recall metric, the best performance value was found in four of the seven classes of the DenseNet169 model. Here, although the InceptionV3 model was successful in only one cherry class, it had the lowest performance with a difference of about 5% compared to the most successful model. According to the recall metric, the highest average values were obtained in Regina and the lowest in Ziraat900 cherry class.

In Table 7, the highest F1-Score was obtained with the DenseNet169 model. The lowest performance was measured with InceptionV3 and ResNet152 models. The most successful model obtained the highest F1-Score in Lambert, Regina, Sweetheart, and North Wonder classes. It is seen that Ziraat900 has the lowest average F1-Score according to cherry varieties.

Table 7 F1-Score values of the test results

In Table 8, the highest Accuracy value is obtained with the DenseNet169 model as in F1-Score. The lowest performance was measured with InceptionV3 with an average of 0.94 and ResNet152 with an average of 0.96. The cherry varieties with the most successful classification accuracy were Regina, North Wonder, Lambert, Sweetheart, and Alyanak. Stella and Ziraat900 varieties have the lowest Accuracy average value.

Table 8 Accuracy values of the test results

The accuracy rates obtained in the training, validation, and testing phases of six different deep learning models with the augmented dataset are presented in Table 9. With the augmented dataset, an average value of 97.42% was obtained in training accuracy, while the lowest value was obtained with DenseNet169 and the highest value was obtained with the VGG19 model. In the validation accuracy, an average value of 97.22% was obtained. The highest value was obtained with the MobileNet model with a verification value of 99.28%, while the lowest value was obtained with the DenseNet169 model with a difference of 3.62% compared to the average value. When the models were evaluated with test data, the DenseNet169 model was the most successful model with a value of 99.57%. While an average value of 97.76% was obtained in test accuracy, the InceptionV3 model with the lowest value was 3.11% below the average.

Table 9 Train, validation, and test accuracy values of the models

In the confusion matrix, the main diagonal represents the correctly predicted images and the others represent the incorrectly predicted images. When the normalized confusion matrix of the DenseNet169 model with the best test classification value is examined in Fig. 8, it is seen that the test images of Lambert, Regina, North Wonder, and Sweetheart cherry varieties are classified with full accuracy. It is observed that the Stella variety is slightly misclassified with the Ziraat900 variety and the Alyanak variety is misclassified insignificantly with Stella.

Fig. 8
figure 8

Confusion matrix of the DenseNet169 model

Figure 9 gives an example of the classification of cherry images of the DenseNet169 model, which has the best accuracy test rate. Supporting the confusion matrix, the species that is actually Alyanak in only one of the images was predicted as Stella by the DenseNet169 model.

Fig. 9
figure 9

Actual and predicted labels of test images by the DenseNet169 model

When the studies in the literature are examined, it is seen that different fruit and vegetable species are classified [11, 12, 17, 19, 20, 22] or few species belonging to the same fruit are classified [9, 10, 21, 22]. In the studies on cherry fruit, Momeny et al. [23] classified two cherry species and Ropelewska et al. [26] classified three cherry species. Villacrés and Cheein [24], Gai et al. [25], Li et al. [28], and Gai et al. [29] classified the ripeness of the same type of cherries. Reyes et al. [27] classified five different cherry species with only 63 images. It is seen that the studies on cherry fruit are on the classification of the degree of ripeness of the same species or the classification of two or three species. Only Reyes et al. [27] classified five different types of cherries, but only 63 images were used in this study. Unlike the studies in the literature, the present study was conducted for the same fruit type, not for different fruit types. Within the same species, 4 species were used in [9], 2 species were used in [10, 21,22,23], while in the present study, classification was performed for more species (7) belonging to the same fruit.

When the success rates of the studies in the literature are analyzed, it is observed that there are four studies [16, 21, 23, 27] with success rates above 99%. While 926 images were used in the study by Katarzyna and Paweł [16], 3570 images were used in this study. While only two apple species were classified in the study by Bal and Kayaalp [21] and only two cherry species were classified in the study by Momeny et al. [23], seven different cherry species were classified in the present study. The classification algorithms used in these three studies and the present study are CNN based. In the study by Reyes et al. [27], 100% success was achieved with only 63 images, while in the present study, 100% success was achieved using 3570 images.

Ensemble learning was used to further increase the highest correct classification rate (99.57%) obtained in the study. For this purpose, the Maximum Voting ensemble method, which uses more than one model to make predictions for each data point, was preferred. As shown in Table 10, the six deep learning models used for the ensemble model were tested with binary, ternary, and quad combinations. With the ensemble learning model, 98.62% accuracy rate was obtained from InceptionV3 (94.81%) and ResNet152 (97.40%) models with the lowest test accuracy rate. Even this result shows that the accuracy rates are noticeably increased with ensemble learning. The best result obtained from the ensemble model is 100%. This value was obtained from the ensemble learning model using two, four, and six models. In order to obtain the highest accuracy rate using the fewest models, Ensemble learning method, which is a combination of only DenseNet169 and NASNet models, was preferred and 100% accuracy rate was obtained.

Table 10 Transfer learning models used in the ensemble learning method

When the studies in the literature where more than one classification algorithm is used as a hybrid are examined, Vasconez et al. [19] obtained 72.00% success rate by using Faster R-CNN + InceptionV2 algorithms. Macanhã et al. [20] classified 15 different fruits with MLP + KNN algorithms with 97.50% success rate. Bal and Kayaalp [21] classified two apple species with CNN + SVM with 99.70% success rate. Kang and Gwak [22] classified fruits as fresh and rotten with 98.54% success rate using ResNet50 + ResNet101 algorithms as hybrid. Gai et al. [25] classified the ripeness of cherries with YOLOv4 + Dense with a 94.70% success rate. Li et al. [28] used YOLOX-EIoU-CBAM to classify the ripeness of cherries at a rate of 84.96%. In this study, seven different types of cherries were classified by ensemble learning method (DenseNet169 + NASNet) with 100% success rate.

Cherries are generally grown in 35º–55º north and south latitudes in the world. When the data for 2022 are evaluated, the countries exporting the most cherries are Chile, Hong Kong, Turkey, USA, Uzbekistan, Iran, and Spain. Although most cherry varieties can be grown in different countries with the same climatic conditions, different cherry varieties can be grown in different geographical regions in the same country. Although this study was carried out for cherries in Isparta region, it has the potential to be realized for other cherry varieties and other fruits in the world.

Conclusion

In this study, cherry, which is a fruit with high economic value, was classified according to its types with the ensemble learning method. A new dataset consisting of 3570 images containing seven different cherry species was created. By applying the data augmentation technique, DenseNet169, InceptionV3, MobileNet, NASNet, ResNet152, and VGG19 models were trained with both original and incremental datasets.

When the incremental dataset was used, the best test performance was obtained in the DenseNet169 model with 99.57% and the lowest test performance was obtained in the InceptionV3 model with 94.81%. DenseNet169 and NASNet models, which gave the best test accuracy results, were applied to the Maximum Voting ensemble learning model and 100% accuracy rate was obtained. The results show that the proposed ensemble method can be used with cherry species with very high accuracy. Ensemble learning strategies were preferred as they are more successful than the classical single learning approach in terms of better classification results, since by combining multiple models, the errors of a single trigger are likely to be compensated by other causes, and as a result, the overall prediction performance of the ensemble will be better than that of a single cause. And a result confirming this was obtained.

When the literature is examined, it has not been seen that there is a study on the classification of cherry species with ensemble learning methods. Although this study is one of the first studies on this subject, a new dataset has been brought to the literature.

Still, there are limitations of the proposed system, such as obtaining images of 7 different species regionally, and not being able to access other cherry cultivars grown in different cherry producing regions. Additionally, environmental variables such as the inability to adjust the light in the controlled environment also cause difficulties in obtaining clean images. It is seen that the problem is solved by staying within the constraints of the current study. In subsequent studies, cherry images of more species from various locations around the world will be added to the dataset and classification for a much larger number of species will be possible.