1 Introduction

The annual grapevine leaves harvest yields an additional product for the field of agriculture especially in Turkey. The classification of grapevine leaves holds significance with regard to their valuation and flavor characteristics [1, 2]. Various kinds of grapevine leaves exhibit distinct leaf attributes, encompassing criteria such as shape, depth, length, featheriness, and sliciness, which display considerable variation [1, 3, 4]. Due to this reason, the grapevine leaves of every variant are not utilized for culinary purposes. Consumers exhibit a lack of preference toward leaves that possess a substantial thickness, with feathers, and are sliced. The optimal choice for culinary purposes is a grapevine leaves cultivar that possesses a slender morphology, devoid of feathers, exhibiting delicate venation, sliced to the most thinness, and imparting a tangy sensation upon the tasting receptors [1]. Hence, the categorization of consumable species from non-consumable grapevine leaves species and the identification of them based on leaf and fruit images are crucial prerequisites in this domain. However, for individuals lacking specialized knowledge, discerning the type of grapevine leaves poses a considerable challenge [1].

Deep learning algorithms have been utilized to develop prediction in classification models. Convolutional neural networks (CNNs) are a variant of deep learning algorithms and they are widely used for image classification or prediction in many fields [5,6,7,8,9,10,11,12]. However, it is important to note that CNNs by themselves may not yield optimal levels of accuracy every time. CNNs possess the capability to perform automated feature extraction without the need for manual crafting and thus, researchers benefit from this property of CNNs [1, 13, 14].

In this study, we prefer to use pre-trained architectures as opposed to developing novel CNNs due to encountered challenges [15]. The pre-trained architectures selected including DarkNet-53 [16], GoogleNet [17], InceptionResNet_v2 [18], NASNetMobile [19], and ResNet-18 [20]. These choices are made due to their frequent utilization in the field. The appointed performs of these architectures involve serving as deep feature generators, wherein their final layer produces features of varying sizes. However, a subset of these features exhibits noise, and employing the entire set results in increased computational complexity. The task of determining the significance of features and performing feature reduction is a highly crucial problem. In order to achieve optimal outcomes in image detection, it is advantageous to leverage feature selection techniques and machine learning algorithms. Various feature selection methods can be employed for this purpose in the literature, including neighborhood component analysis (NCA) [21], principle component analysis (PCA) [22], Chi-square [23], minimum redundancy maximum relevance (mRMR) [24] and proper orthogonal decomposition (POD) [25], etc. For example, POD is a statistical method that reduces feature size. This method involves determining orthonormal basis functions and time-dependent orthonormal amplitude coefficients. Therefore, POD reduces dimensionality using linear single-value decomposition. Nevertheless, during the application of these feature selection methods, researchers may face challenges related to computational complexity, assumptions, or time-consuming processes. In this study, we focus on overcoming these issues, particularly the challenges associated with computational complexity. The objective of this paper is to propose novel methodologies for feature selection utilizing sampling theory (SRS, RSS, ERSS, and MERSS) and analyze their impact on the performance of classification models. In order to conduct a comparative analysis between our proposed and existing method, we utilize the grapevine leaves dataset. Experimental results show that our proposed method is superior than the others. Then, reduced features using our proposed method are classified via ANN. Therefore, our proposed approach develops a novel hybrid algorithm-based CNN, new feature selection method, and ANN.

In the pursuit of our objective, we attempt to employ based on CNNs to discern and classify grapevine leaf images, thus helping with the identification of plant species.

1.1 Novelties and Contribution of this Study

Feature selection is an important and frequently used technique for dimension reduction by removing irrelevant and redundant information from the data set to obtain an optimal feature subset. It brings the immediate effects of speeding up a data mining algorithm, improving learning accuracy, and enhancing model comprehensibility. The research in feature selection has been a challenging field, some researchers have doubted about its computational feasibility [26]. For these reasons, this topic becomes one of the key problems for machine learning and data mining. In this paper, we have introduced a new approach to feature selection-based sampling theory and searched for effectiveness. Second important part of this study is classification of grapevine leaves. Traditional identification methods for grapevine leaves often rely on the knowledge and experience of experts, and it is difficult to identify [27].

We can list the main contributions and novelties of this study step by step are as follows:

  • Obtained original grapevine leaves images having five classes: Ak, Ala Idris, Büzgülü, Dimnit, and Nazli from the website http://www.muratkoklu.com /datasets/Grapevine_Leaves_Image_Dataset.rar

  • Utilized fivefold cross-validation to obtain reliable outcomes for DarkNet-53, GoogleNet, InceptionResNet_v2, NASNetMobile, and ResNet-18.

  • Compared these pre-trained architectures utilizing the softmax layer for the purpose of classifying grapevine leaves images.

  • Employed these architectures for extracting features from images. Extracted features were obtained specifically from the average pooling layer of the respective architecture.

  • Calculated feature weights via Mahalanobis distance metric and ordered the weights in descending order by an inexpensive method.

  • Ordered features along with their weights and the selection of features was associated with weights.

  • Proposed novel feature selection methods based on sampling theory: SRS, RSS, ERSS, MERSS.

  • Identified the number of features and selected important features according to the methodological characteristics of the methods.

  • Classified selected features through artificial neural network (ANN).

  • Investigated performance of classification on the hybrid algorithms from grapevine leaf images.

  • Compared these suggested methods with NCA on the performance of classification.

  • Finally, the highest accuracy is obtained by using DarkNet53-MERSS-ANN hybrid algorithm. Figure 1 shows the pipeline of this presented study.

Fig. 1
figure 1

Pipeline of this study

1.2 Literature Review

In recent years, scientific investigations have primarily concentrated on the examination of disease identification and species classification through the utilization of leaf images, as documented in current collections of literature [1, 28, 29].

Tiwari, et al. [30] performed a deep learning-based system to detect plant diseases and classify various types. Besides, they implemented five cross-validations while training the dataset, which has 27 different classes. As a result, they obtained an average cross-validation accuracy of 99.58% and an average test accuracy of 99.199%.

Ahila Priyadharshini, et al. [31] aimed to identify crop disease from maize leaf images via their proposed convolutional neural network (CNN). In fact, they modified LeNet and trained four different classes—three diseases and one health class—from the PlantVillage dataset.

Azim, et al. [32] utilized decision trees, which are one of the machine learning algorithms, to detect three different rice leaf diseases from images. They manually extracted features from images such as color, shape, and texture. Lastly, their study achieved an accuracy of 86.58%.

Sembiring, et al. [33] focused on detecting tomato leaf diseases, which are classified into nine different classes via CNNs architectures: very deep convolutional neural networks (VGG), ShuffleNet, and SqueezeNet. They also used healthy leaves to distinguish themselves from the competition. In total, 10 different classes were utilized and classified with these architectures. Finally, the study obtained the highest accuracy of 97.15%.

Zhang et al. [34] proposed a novel approach to detecting cucumber leaf disease. Firstly, they segmented disease by using K-means clustering, and then they extracted features such as shape and color from lesion information. Lastly, they classified leaf images to detect disease utilizing sparse representation (SR). At the end of the study, they obtained an accuracy of 85.7% with this approach.

Sladojevic, et al. [35] created a model using the CNNs algorithm to distinguish 13 different plant diseases from leaf images via Caffe. Finally, their study achieved an average precision of 96.3%.

Kan, et al. [36] searched for medicinal plants, which are essential in traditional Chinese medicine, via a support vector machine (SVM). Before the classification stage, image features such as shape and texture are extracted for each of the 12 different leaf types. When the features are classified via SVM, application results achieve an average accuracy of 93.3%.

Koklu, et al. [1] performed grapevine leaf image classification using MobileNetv2, which is one of the pre-trained convolutional neural network architectures. Their dataset consists of five different classes and 500 grapevine leaf images. While they classify the dataset via MobileNetv2, they do not find it sufficient and then combine it with SVM to obtain the best classification results. Prior to this merging, the feature selection method, Chi-Square, is applied, and then which kernel is successful is investigated. At the end of the study, they expressed that the best kernel is Cubic with 250 features selected and an accuracy of 97.60%.

Dudi and Rajesh [37] introduced a novel deep learning hybrid algorithm to identify leaf types. Their algorithm includes enhanced CNN with optimization methods for activation functions and hidden neurons. Their proposed method is the Shark Smell-Based Whale Optimization Algorithm (SS-WOA). Besides, they tested this hybrid algorithm on untrained and collected leaf images and obtained an accuracy of 86%. In addition to these studies, Table 1 displays state-of-the-art studies belonging to leaf image classification.

Table 1 State-of-the-art studies on leaf images in the literature

The rest of this study is organized as follows: the grapevine leaves dataset and used methods are given in Sect. 2. Next, we present experimental results, performance metrics, fine tuning parameters, and cross-validation in Sect. 3. Then, we have discussed advantages and disadvantages of this study in Sect. 4. Eventually, we have finalized this study and express the future works, in Sect. 5.

2 Methods

2.1 Dataset of Grapevine Leaves

Plants play an important role in the world [49]. In nature, there are many species of plants, and their detection is so difficult and time-consuming [50]. Grapevine leaf is also a special plant that has different properties such as shape, thickness, featheriness, and slickness, and detecting it is quite hard by the naked eye. Traditional identification methods often rely on the knowledge and experience of experts [27].

Bodor-Pesti et al. [51] summarized the efforts of metric characterization of the grapevine leaf with the introduction of the scientific objectives and reviewing the studies showing the innovations in phenotyping during the past 120 years. The International Organization of Vine and Wine is one of the most important institutions in the viti-viniculture sector, providing statistical data about the World’s viticulture and oenology. They organize events and share standardized manuals for the description of grapevine genotypes. In 2009, OIV published the second Edition of the “OIV Descriptor List for Grape Varieties and Vitis Species” which contains more than 150 descriptor traits for the purposes of characterization and identification. (For more details kindly see Bodor-Pesti et al. [51]).

The dataset includes grapevine leaf images with five classes: Ak, Ala Idris, Büzgülü, Dimnit, and Nazli. Figure 2 displays classes of grapevine leaves which obtained by traditional identification methods rely on the knowledge and experience of experts. After closer examination of Fig. 2, it becomes apparent that there are no discernible differences among the various grapevine leaf classes. Identifying grapevine leaves can be challenging for those without expertise in the field.

Fig. 2
figure 2

Grapevine leaves classes

In total, there are 500 images, consisting of 100 images for each class. Besides, all images have RGB (red, green, and blue) format and 512 × 512x3 dimensions. This dataset was created by Koklu, et al. [1] and obtained from the website http://www.muratkoklu.com/datasets/Grapevine_Leaves_Image_Dataset.rar.

In this study, we did not resize the images during the preprocessing phase manually. However, each pre-trained architecture may accept different input sizes of the image. Therefore, we perform the data augmentation process automatically to resize the dimension that is accepted as the input size for each pre-trained architecture. Table 2 displays the input size of each architecture.

Table 2 Properties of pre-trained architectures

2.2 Deep Feature Extractors

With the implosive advance in data and the fast development of algorithms such as machine learning and deep learning, artificial intelligence (AI) has obtained novelties in a wide range of applications [52]. Notably, researchers prefer deep learning algorithms to analyze images due to their ability to extract features [1, 6, 48, 53,54,55]. When desiring to classify any image using a machine learning algorithm, the features of the image are extracted manually through a process known as hand-crafting. Nonetheless, this circumstance is time-consuming and requires expert consideration. To analyze images from any field, it is difficult to locate specialists; consequently, results cannot be obtained rapidly. The feature extraction problem can now be handled by CNNs [6].

In this study, we automatically extract features from grapevine images using the networks DarkNet-53, GoogleNet, InceptionResNet_v2, NASNetMobile, and ResNet-18. Table 2 exhibits the number of parameters, the layers, the input size, and the years in which pre-trained architectures have been developed. Furthermore, the next sub-section will provide a concise presentation of these architectures.

2.2.1 DarkNet-53

Darknet-53, a convolutional neural network (CNN) developed by Redmon and Farhadi [16], is the primary module for extracting features in order to identify objects within the Yolov3 network [56]. The architecture comprises a total of 53 deep convolutional layers, and it is denoted as DarkNet-53 due to the specific count of these layers. Indeed, there exist repetition blocks, resulting in a total number of layers amounting to 106. The specified architecture is designed to accommodate an image input with dimensions of 256 × 256. Table 3 presents comprehensive information regarding the architectures. Furthermore, it has been observed that DarkNet-53 exhibits superior performance in the context of classification and extraction of features within the scope of this investigation.

Table 3 DarkNet-53 details [16]

2.2.2 GoogleNet

The GoogleNet architecture was proposed by Szegedy, et al. [17]. The architecture exhibits a multitude of layers, including two convolutional layers, four max-pooling layers, nine inception layers, a global average pooling layer, a dropout layer, a linear layer, and a softmax layer. GoogleNet is composed of a total of 22 layers, which are deeper in nature. It effectively employs activation layers using the rectified linear unit (ReLU) function. GoogleNet consists of a total of 7 million parameters.

2.2.3 InceptionResNet

The InceptionResNet_v2 model is a fusion of the ResNet and Inception architectures, as proposed by Szegedy, et al. [18] 57. The architecture executes residual connections in an efficient manner, rather than employing connection filtering to enhance performance and accelerate training time [58].

2.2.4 NasNetMobile

The NASNetMobile architecture, developed by Zoph, et al. [19], aims to explore optimal CNNs structures using reinforcement learning methods. The team from Google Brain, as presented by Addagarla, et al. [59], has made significant advancements in the field of Neural Architecture Search (NAS). While NAS architectures exhibit variations in their sizes, it is worth noting that NasNetMobile represents a scaled-down iteration. The parameter count of NASNetMobile is approximately 4.5 million. The accepted input image size is 224 × 224 pixels.

2.2.5 ResNet-18

The architecture known as ResNet-18, as described by He, et al. [20], consists of a total of 72 layers, with 18 of them being deep layers. Moreover, it was developed in the year 2016. This architecture aims to efficiently provide a multitude of convolutional layers for functioning. The core principle of ResNet involves implementing skip connections, commonly referred to as shortcut connections. During this iterative process, the interconnection compresses the underlying structure, leading to accelerated learning within the network. The structure is recognized as a directed acyclic graph (DAG) network due to its intricate layered configuration [60].

These architectures have been employed for the purpose of both classification and the generation of deep features. When they are utilized as the classifiers, the softmax layer is applied. However, in the case of utilizing deep feature generators, the implementation of the softmax layer is absent, resulting in the acquisition of deep features widely from the last layers. For example, DarkNet-53 yields a feature vector of dimension 1024. However, it is necessary to decrease the number of features in order to attain optimal performance. In the context of this study, our objective is to design innovative techniques for selecting features.

2.3 A Novel Feature Selection Approach Based on Sampling Theory

In this study, we aim to find solutions to these problems, with a special emphasis on those involving computational complexity. Therefore, we present a novel feature selection methodologies SRS, RSS, ERSS, and MERSS based on sampling theory for improving the classification performance of grapevine leaves images. In the next sub-sections, we introduce the proposed methods, and overall Algorithm process.

2.3.1 Simple Random Sampling (SRS)

Simple random sampling (SRS) is a very common sampling design used by many researchers because of practicality. To improve the precision new sampling designs are also suggested in literature. One of these designs is Ranked set sampling (RSS) was first proposed by McIntyre [61] as an alternative the SRS. When we compare both sampling designs for the same sample size, we can say that RSS becomes more efficient than SRS as long as a more accurate and accessible ranking criterion is available for increasing grapevine leaves classification performance. For detail literature; kindly see Zamanzade and Mahdizadeh [62], Bouza-Herrera and Al-Omari [63], Koyuncu and Al-Omari [64], etc. In big data literature, there are many important studies using sampling designs to reduce computational complexity, challenge imbalanced data and increase the precision [65, 66]. In this study, following new ranked set sampling designs, we have proposed new procedure for feature selection weights.

2.3.2 Ranked Set Sampling (RSS) Procedure for Feature Selection

Following McIntyre [61], we can define a new procedure for feature selection as:

  1. 1.

    Order the weights in descending order by an inexpensive method.

  2. 2.

    Select “n” features of size n, respectively, called sets.

  3. 3.

    Measure accurately the first ordered feature from the first set, the second ordered feature from the second set. The process continues in this way until the maximum ordered feature from the last n-th set is measured.

Note that we can select “Integer (sqrt (weight_size)” size of feature with this procedure.

2.3.3 Extreme Ranked Set Sampling (ERSS) Procedure for Feature Selection

When the set size n is large, RSS may have ranking errors. As an attempt to overcome this problem several variations of RSS have been proposed by researchers. The main idea of ERSS is that the identification of the maximum rank is much easier than the determination of the all ranks [67]. We can define a new procedure called “ERSS Procedure” for feature selection as:

  1. 1.

    Order the weights in descending order by an inexpensive method.

  2. 2.

    Select “n” features of size n, respectively, called sets.

  3. 3.

    Measure accurately the maximum ordered feature from the first set, the maximum ordered feature from the second set. The process continues in this way until the maximum ordered feature from the last n-th set is measured.

Note that we can select same feature size with RSS and unlike the classical ERSS defined by Samawi, et al. [67], in our approach we only take into account maximum values instead of minimum.

2.3.4 Moving Extreme Ranked Set Sampling (MERSS) Procedure for Feature Selection

Another modification of RSS, namely moving extreme ranked set sampling (MERSS), was introduced by Al-Odat and Al-Saleh [68]. Following Al-Odat and Al-Saleh [68], we have suggested following procedure to select feature weights:

  • Order the weights in descending order by an inexpensive method.

  • Select “n” features of size 1, 2, 3, …, n, respectively, called sets.

  • Measure accurately the maximum ordered feature from the first set, the maximum ordered feature from the second set. The process continues in this way until the maximum ordered feature from the last n-th set is measured.

This modification of RSS, in addition of being easier to execute than both usual RSS and fixed size extreme RSS, keeps some of the balancing inherited in the usual RSS. Hence, it can be concluded that the MERSS algorithm exhibits superior efficiency compared to other methods in the task of feature selection, resulting in a significantly enhanced performance in the classification of grapevine leaves images.

According to our proposed feature selection methods, we select significant features, which are subsequently subjected to classification using an ANN with high efficiency.

2.4 Classification via Artificial Neural Network (ANN)

Artificial neural networks (ANNs), as introduced by McCulloch and Pitts [69], emerged as a result of studying brain functionality and subsequently found application in computer programs [6, 70, 71]. In addition, it is important to note that any ANN comprises numerous individual units, commonly referred to as neurons or processing elements (PE). These units are interconnected through weights, which facilitate the neural structure of the network. Furthermore, these interconnected units are typically organized in layers to ensure proper coordination and functioning of the ANN [72].

ANN originates from a prosperous lineage of nonlinear algorithms. When utilized in the domain of machine learning, particularly in the context of supervised learning, the outcomes have exhibited considerable success in recent times. Additionally, artificial neural networks (ANN) possess a flexible architecture that can be effectively employed to accommodate a wide range of real-world datasets [73]. One may consider referring to the book authored by H. Jiang (2021) in order to acquire specific information.

Based on the aforementioned considerations, the implementation of an ANN is carried out in this study for the purpose of efficient classification of grapevine leaves images. This decision is based on the fundamental principles established by Ozaltin, et al. [6] in their study. When implementing an ANN, a configuration is chosen where there are 100 hidden layers and each layer has 5 neurons followed by a softmax layer. Furthermore, ReLU activation layer is employed to enhance the efficiency of the algorithm. The training solver has selected the limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) optimization algorithm from quasi-Newton algorithm family. Additionally, the maximum number of iterations is set to 1000, while the learning rate, minimum gradient tolerance, and loss tolerance are assigned values of 0.02, 1e-4, and 1e-5, correspondingly. By careful choosing these parameters, we can achieve improved classification performance.

2.5 Performance Metrics

In this study, to measure of algorithms’ performance metrics, accuracy, area of under curve (AUC), F1-measure, geometric mean (G-mean), kappa value, precision, and recall were used, in addition, expressed Eq. (1)–(5) as follows:

$${\text{Accuracy = (TP + TN)/(TP + TN + FP + FN)}}$$
(1)
$$F1 - {\text{Measure}} = {{\left( {2 \times {\text{TP}}} \right)} \mathord{\left/ {\vphantom {{\left( {2 \times {\text{TP}}} \right)} {\left( {2 \times {\text{TP}} + {\text{FP}} + {\text{FN}}} \right)}}} \right. \kern-0pt} {\left( {2 \times {\text{TP}} + {\text{FP}} + {\text{FN}}} \right)}}$$
(2)
$${{{\text{Recall}} = {\text{TP}}} \mathord{\left/ {\vphantom {{{\text{Recall}} = {\text{TP}}} {\left( {{\text{TP}} + {\text{FN}}} \right)}}} \right. \kern-0pt} {\left( {{\text{TP}} + {\text{FN}}} \right)}}$$
(3)
$${{{\text{Precision}} = {\text{TP}}} \mathord{\left/ {\vphantom {{{\text{Precision}} = {\text{TP}}} {\left( {{\text{TP}} + {\text{FP}}} \right)}}} \right. \kern-0pt} {\left( {{\text{TP}} + {\text{FP}}} \right)}}$$
(4)
$$G - {\text{Mean}} = \sqrt {{\text{Sensitivity}}\,\, \times \,\,{\text{Specificity}}}$$
(5)

where \(\text{TP}\): True positive, \(\text{FP}\): false positive, \(\text{TN}\): true negative, and \(\text{FN}\): false negative are shown [8, 71, 74, 75].

The above performance metrics are widely employed to compute the performance of the classifier. In this study, not only calculated these metrics but also kappa value (\(\kappa\)) is acquired which algorithms performances are acceptable. If \(\kappa\) is close to 1, we can say that the results are perfect. Otherwise, if \(\kappa\) is close to 0, the results are unacceptable [76]. Equation (6) and (7) are employed to evaluate \(\kappa\) as shown in Eq. (8).

$$p_{{\text{A}}} = {\text{Accuracy}}$$
(6)
$$p_{{\text{E}}} = \frac{{\left( {{\text{TP}} + {\text{TN}}} \right)\left( {{\text{TP}} + {\text{FP}}} \right) + \left( {{\text{FP}} + {\text{TN}}} \right)\left( {{\text{TP}} + {\text{FN}}} \right)}}{{\left( {{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}} \right)^{2} }}$$
(7)
$$\kappa = \max \left( {\frac{{p_{A} - p_{E} }}{{1 - p_{E} }};\frac{{p_{E} - p_{A} }}{{1 - p_{A} }}} \right)$$
(8)

2.6 Cross Validation

The deep learning algorithm lacks the capability to communicate coherently using mathematical models, resulting in a lack of clarity regarding the transformation of inputs into outputs [77]. Hence, the algorithm is denoted as encapsulating a black-box. The k-fold cross-validation method is commonly preferred by researchers in order to obtain reliable outcomes [6, 78,79,80]. Moreover, this algorithm effectively mitigates the problem of overfitting during data analysis [80]. In this algorithmic procedure, the dataset gets a process of partitioning into a set of subsets, wherein the number of subsets is randomly determined. One of these subsets is then designated as the test set, while the remaining subsets are utilized for training the structure [81]. The algorithm is iterated for k folds and evaluated using the framework proposed [82]. In this study, the variable k is assigned a value of 5 to ensure reliable outcomes in the classification process.

2.7 Fine Tuning Parameters

In this study, the determination of fine-tuning parameters is crucial for attaining optimal outcomes and ensuring fair comparability with DarkNet-53, GoogleNet, NasNetMobile, InceptionResNet_v2, and ResNet-18. The identified parameters for fine-tuning are as follows: stochastic gradient descent (sgdm) is used as the optimizer, the learning rate is set to 0.0001, the maximum number of epochs is 10, the minibatch size is 8, and a constant learning rate schedule is utilized.

Following the utilization of methods and parameters, the pseudocode including the entirety of the algorithm can be accessed below.

Algorithm

This study of procedures to detect grapevine leaves image types as follows:

  1. 1.

    Start with input color grapevine leaf images (512 × 512 × 3)

  2. 2.

    Resize into images according to each architecture’s accepted dimension.

  3. 3.

    Train the images with fivefold cross-validation using DarkNet-53- GoogleNet- InceptionResNet_v2- NasNetMobile- ResNet-18 through fine tuning parameters.

  4. 4.

    Calculate performance metrics to compare algorithms.

  5. 5.

    Save net.

  6. 6.

    Split the images as 70% training and 30% testing.

  7. 7.

    Activate the net.

  8. 8.

    DarkNet-53- GoogleNet- InceptionResNet_v2- NasNetMobile- ResNet-18 is utilized to reduce high dimension of images and extract features from average pooling layer.

  9. 9.

    Obtain “n” features weights via Mahalanobis distance metric.

  10. 10.

    Apply suggested method: ERSS-MERSS-RSS-SRS, and NCA.

  11. 11.

    Take selected features.

  12. 12.

    Classify with ANN to detect types of grapevine leaf images.

  13. 13.

    Calculate performance metrics.

  14. 14.

    Save result.

  15. 15.

    Find the best structure according to performance metrics.

  16. 16.

    End.

3 Results

3.1 Experimental Results

The investigation has been prepared using MATLAB 2021b. The primary objective of this study is to propose a new method for selecting features and to identify grapevine leaves types using hybrid algorithms that have been developed. First and foremost, the dataset is acquired from the public website, as indicated in the dataset section. The following models, namely DarkNet-53, GoogleNet, NasNetMobile, InceptionResNet_v2, and ResNet-18, are utilized as automatic feature extractors using a fivefold cross-validation and transfer learning approach, each model being evaluated individually. In the subsequent procedure, we derive feature weights from the final layer of the relevant architecture. Subsequently, the features are chosen using recommended techniques: ERSS, MERSS, RSS, and SRS. In the meantime, we employ ANN to classify all the selected features. Table 4 demonstrates the performance of these pre-trained architectures in classifying grapevine leaf images through the utilization of fivefold cross-validation.

Table 4 Performance of pre-trained architectures on grapevine leaf images using five fold cross-validation

Based on the data presented in Table 4, it can be observed that the utilization of pre-trained architectures has resulted in highly successful performances. The empirical findings suggest that DarkNet-53 achieves the most optimal performance, exhibiting an accuracy rate of 96.20% during this particular stage. The second model is ResNet-18, which exhibits a commendable accuracy of 95%. Following the series of performances, InceptionResNet_v2, NASNetMobile, and GoogleNet exhibit accuracies of 89.8%, 88.2%, and 86% in that order.

Despite the satisfactory nature of the performances, we employ the feature selection technique and machine learning algorithm to ascertain the credibility of the research findings. In the current phase of this study, we have partitioned the dataset into a 70% training subset and a 30% testing subset subsequent to training the dataset using the relevant architecture. Subsequently, we extract features from the last layer (which varies depending on the specific architecture) and subsequently employ our proposed methodologies, namely ERSS, MERSS, RSS, and SRS, to select the desired features. The classification stage has commenced using an ANN, and we have evaluated all hybrid structures. The outcomes of these evaluations are documented in Tables 5, 6, 7, 8, 9.

Table 5 Hybrid algorithm performance with DarkNet53, suggested methods, and ANN
Table 6 Hybrid algorithm performance with GoogleNet, suggested methods, and ANN
Table 7 Hybrid algorithm performance via NasNetMobile, suggested methods, and ANN
Table 8 Hybrid algorithm performance using InceptionResNet, suggested methods, and ANN
Table 9 Hybrid algorithm performance using ResNet-18, suggested methods, and ANN

When extracting features from DarkNet-53, it has been observed that the global average pooling layer, commonly referred to as 'gap', proves to be beneficial. A total of 1024 features are acquired from the layer, and a subset of 36 features is selected using the ERSS, RSS, and SRS techniques. Furthermore, a total of 45 significant features have been determined using the MERSS method. The output is presented in Table 5.

Based on the findings presented in Table 5, the application of our proposed methodologies yields efficient results. It is worth noting that the performances obtained are in the form of test results. We are pleased to report that these results exhibit a high level of confidence in accurately identifying different types of grapevine leaf. The DarkNet53-MERSS-ANN algorithm has achieved the highest test accuracy of 97.33% along with other metrics. Additionally, the kappa value approaches 1, indicating that the algorithm can be considered highly successful. Furthermore, the performance is enhanced with the proposed methodology. In the previous iteration, the DarkNet-53 model demonstrated a classification accuracy of 96.20%. Furthermore, the algorithm employed in this study yields the utmost test accuracy when tasked with classifying grapevine leaf images. Additionally, Fig. 3 depicts the confusion matrix obtained from the DarkNet53-MERSS-ANN algorithm. Figure 4 displays the ANN training accuracy and loss graph after the feature selection process.

Fig. 3
figure 3

Confusion matrix of DarkNet53-MERSS-ANN algorithm

Fig. 4
figure 4

ANN with 5 neurons of memory per layer with 100 hidden layers

The following model in consideration is GoogleNet. If features are extracted from the GoogleNet model, the average pooling layer, specifically named 'pool5-7 × 7_s1', is observed to be useful. Indeed, a total of 1024 features are extracted from the layer, and subsequently, a selection process is performed using ERSS, RSS, and SRS techniques to identify the most significant 36 features. Furthermore, a total of 45 significant features have been identified using the MERSS method. The output of the computation is displayed in Table 6.

Table 6 indicates that the GoogleNet-RSS-ANN algorithm, with a test accuracy of 95.33%, is the most accurate of the methods we have proposed. The values for the sensitivity, G-Mean, F-measure, kappa value, and AUC are 95.33%, 97.07%, 0.9533, 0.8542, and 0.9946, respectively. In addition, the kappa value is close to 1, which indicates that the algorithm is quite successful. Moreover, the performance is enhanced by the suggested method. Previously, the accuracy of GoogleNet was 86%.

The model is referred to as NasNetMobile. If it's desired to collect features from NasNetMobile, it is possible to employ a global average pooling layer, denoted as 'global_average_pooling2d_1'. A total of 1056 features have been extracted from the layer, and a subset of 36 features is selected using the ERSS, RSS, and SRS methods. Additionally, the MERSS method is utilized to identify a total of 45 crucial features. The outcomes are presented in Table 7.

Table 7 shows that the NasNetMobile-MERSS-ANN algorithm outperforms our suggested methods, with a test accuracy of 79.33%. Furthermore, its sensitivity, G-Mean, F-measure, kappa value, and AUC are 79.33%, 86.74%, 0.7894, 0.3542, and 0.9375, respectively. When the kappa value is compared to one, it indicates that the algorithm is not preferable. NasNetMobile previously had an accuracy of 88.20%.

The one after that is InceptionResNet_v2. If features are obtained from it, a global average pooling layer known as 'avg-pool' is found to be useful. Essentially, 1536 features are extracted from the layer, with significant 53 features selected using the ERSS, RSS, and SRS methods. The MERSS method also identifies 55 essential features. Table 8 displays all of the results.

Table 8 shows that when using our recommended techniques, two algorithms—InceptionResNet-ERSS-ANN and InceptionResNet-MERSS-ANN—perform best, with test accuracy of 92.67%. Additionally, they achieve the same ratio for their sensitivity, G-Mean, F-measure, kappa value, and AUC, which are 92.67%, 95.38%, 0.9266, 0.7708, and 0.9953, respectively. InceptionResNet previously achieved an accuracy of 89.90%.

ResNet-18 is the last. If features are taken from it, the 'pool5' average pooling layer is discovered to be useful. In essence, 512 features from the layer are collected, and the most significant 24 features are chosen using the ERSS, RSS, and SRS methods. 32 significant features are also found using the MERSS method. Table 9 displays the complete results.

Analyzing our suggested methods, Table 9 shows that ResNet18-MERSS-ANN performs the best, with a test accuracy of 85.33%. Additionally, its sensitivity, G-Mean, F-measure, kappa value, and AUC are all achieved at respective levels of 85.33%, 90.67%, 0.8531, 0.5417, and 0.9761. Prior to this, ResNet18 has a 95% accuracy rate.

In recent years, numerous researchers have employed various feature selection methods in their studies [1, 6, 83]. However, the process of identifying the suitable feature selection is not straightforward, as certain methods may rely on underlying assumptions. This study proposes several practical methods and conducts a comparison with neighborhood component analysis (NCA), a nonparametric method that operates without making any assumptions. The results of applying various combinations of NCA are presented in Table 10

Table 10 Performance of hybrid algorithm based on NCA and ANN combination

Based on the findings presented in Table 10, it can be observed that DarkNet53-NCA-ANN emerges as the most effective feature selection method, exhibiting a notable accuracy rate of 96.67%. In addition, the sensitivity, G-Mean, F-measure, kappa value, and AUC of the system were determined to be 96.67%, 97.90%, 0.9664, 0.8958, and 1.00, respectively. The confusion matrix of the DarkNet53-NCA-ANN model is depicted in Fig. 5.

Fig. 5
figure 5

Confusion matrix of DarkNet53-NCA-ANN

DarkNet-53 architecture is employed as a deep feature extractor for grapevine leaf images. To enhance the performance of the feature extraction process, the most effective feature selection method, known as MERSS, is utilized. Through the application of MERSS on the extracted features, a notable accuracy rate of 97.33% is achieved. In contrast to NCA, MERSS demonstrates superior performance. Hence, it can be asserted that the feature selection method we have developed exhibits the highest level of performance.

The deep feature extractor GoogleNet is used for the purpose of extracting features from grapevine leaf images. The feature selection method that yields successful results is RSS, which is applied to the extracted features. This method achieves an accuracy of 95.33%. When comparing it to NCA, it exhibits inferior performance in comparison with RSS. Our model, GoogleNet, consistently outperforms other models, making it the optimal choice.

InceptionResNet_v2 is implemented as a deep feature extractor for grapevine leaf images. The successful feature selection methods employed are MERSS and ERSS, which operate on the features and achieve a hit accuracy of 92.67%. When comparing it to NCA, it performs worse than MERSS and ERSS. Our model, InceptionResNet_v2, is the best one to use.

NasNetMobile operates as a deep feature extractor for grapevine leaf images. The top feature selection method used is MERSS, which is applied to the extracted features. This method resulted in an accuracy of 79.33%. When comparing it to NCA, MERSS exhibits better performance. In conclusion, when utilizing NasNetMobile, our solution proves to be the most effective once again.

ResNet18 is utilized as a deep feature extractor for grapevine leaf images. The feature selection method employed was MERSS, which was applied to the extracted features. This approach resulted in an accuracy of 85.33%. When comparing it to NCA, MERSS has lower performance. Once again, when utilizing ResNet-18, we eventually have the best one. Based on the results, it can be concluded that MERSS performs well after implementing deep feature extractors. Additionally, it has been mentioned earlier that MERSS is superior to NCA.

4 Discussion

Some advantages and disadvantages are discussed in this section of the study. The following are the primary benefits of the study:

  1. (i)

    Extensive comparisons are made using pre-trained architectures such as DarkNet-53, GoogleNet, NasNetMobile, InceptionResNet_v2, and ResNet-18.

  2. (ii)

    To achieve confidential results, each architecture uses fivefold cross-validation, and all results are measured using accuracy, sensitivity, G-Mean, F1-measure, kappa value, and AUC.

  3. (iii)

    To improve classification performance on grapevine leaf images, pre-trained architectures are used as automatic deep feature extractors from the final layers (pooling layers are used). This is critical because expert opinions are not required.

  4. (iv)

    To reduce dimensions and select significant features, we propose novel sampling theory-based methods that ensure reliable study results.

  5. (v)

    Finally, ANN is an excellent classification algorithm with 100 hidden layers and each layer has 5 neurons for detecting different types of grapevine leaf images. In addition, to evaluate the proposed methods, we compared them to NCA, a widely used feature selection method. Our proposed method outperforms NCA.

The following are the study's drawbacks:

  1. (i)

    Grapevine leaf images are restricted and only investigated as a balanced dataset

5 Conclusion

Deep learning implementations are getting better and better today. This reflection can be seen with the naked eye and intensifies near vegetation. With this viewpoint, we have successfully extracted features from images of grapevine leaves in order to categorize the species of these leaves using pre-trained architectures. First, using fivefold cross-validation, DarkNet-53, GoogleNet, InceptionResNet_v2, NasNetMobile, and ResNet-18 are used to directly classify images of grapevine leaves. The accuracy of DarkNet-53 in this section of the study is 96.20%. Although the results are excellent, we look into how they can be improved and offer new feature selection techniques based on sampling theory. Pre-trained architectures are used in the following section of this study as feature extractors, and their final average pooling layer automatically incorporates features from images. All of these features, though, are not crucial details for images. We recommend SRS, RSS, ERSS, and MERSS as four feature selection methods to choose significant features. Additionally, using methodology, we determine how many features should be chosen. It allows for effective classification using a minimal set of features. Additionally, ANN is used to classify these features. In brief, the following outcomes are attained:

DarkNet-53 is used as a deep feature extractor, and the MERSS feature selection method and ANN classifier yield a maximum accuracy of 97.33%. While GoogleNet is used as a deep feature extractor, the RSS feature selection method and ANN classifier are used to achieve the highest accuracy, which is 95.33%. Better accuracy is attained as 92.67% when InceptionResNet_v2 is used as a deep feature extractor, providing the RSS and MERSS feature selection methods and ANN classifier. The accuracy increases to 79.33% when NasNetMobile is used as a deep feature extractor in combination with the MERSS feature selection method and ANN classifier. As a deep feature extractor, ResNet-18 is implemented, but using the MERSS feature selection method and ANN classifier, accuracy is improved to 85.33%. After all, a comparison with NCA has been made, and we can state that our suggested methods, specifically the MERSS method, are superior to it under comparable circumstances. As a result, the performance is effectively improved by the planned hybrid algorithms, and it can be inferred from explanatory methods that the results are reliable. Using DarkNet53-MERSS-ANN, the study's best performance is achieved with an accuracy of 97.33% when identifying different grapevine leaf types from images. Finally, we can affirm that the structure we created performs superbly.

5.1 What Happens in the Next Study?

This study demonstrates how pre-trained architectures can identify different plant species from images. Therefore, automatic species identification for experts, farmers, and researchers is provided by image classification of plants. However, because of the suggested algorithm, people are not spending a lot more time identifying plant species. We can say that the created structure is capable of performing and being advanced in plant detection. We will use the suggested feature selection techniques on a variety of datasets from various fields in our upcoming work.