1 Introduction

Human beings rely heavily on plants for their survival. Other than providing us with food and oxygen, few of them also possess medicinal qualities which heal many diseases. Presence of antibacterial activity in 12 Indian Medicinal Plants provide an inspiration to develop drug compounds for human health [1] which can be put into therapeutic uses like controlling diarrhoea, nervous disorders, leprosy, leukoderma among others. Medicinal plants have been an active topic of research for a long time now as the medicines made out of them are available at cheaper costs and have no possible side effects. Few plants also possess anti-colorectal cancer properties which tend to provide better outcome than the current chemotherapy treatments [2]. Medicines obtained from plants also proved to be effective when battling with COVID-19 pandemic [3] and with disease like obesity [4]. Research on-going in the field of deriving medicines from medicinal plants, throws light on another important aspect which is the automatic detection of medicinal plants, that can make the common man easily identify plants growing in their vicinity and make utmost use of them for their health benefits. Though, plants can be identified using any plant part which includes stem, fruit, bark the leaf has been widely used due to its availability in all seasons [5].

Machine learning makes use of image features to distinguish one class/species of leaves from another. Leaf image has color, texture, venation, and shape features. The feature vector can be constructed by exclusively considering shape, texture, venation, or using the combination of one or more features. Once the leaf image features are extracted and the relevant feature vector is obtained, it is input into the classifier to classify the image into its species. Different feature extraction and classification techniques used in recent years is discussed [6]. Another aspect of Artificial Intelligence, called deep learning imitates the human brain to recognize patterns in images and to distinguish 1 class of images from another.

In this paper, we have explored transfer learning and ensemble learning to distinguish medicinal leaf images into 30 classes. The transfer learning approach makes use of existing knowledge to solve problems in other fields [7]. It has been widely used in image recognition, where the Convolutional Neural Network models pre-trained on high-end GPUs can classify objects into 1000 classes. This knowledge attained by these networks can be transferred for medicinal leaf image recognition. We have trained 3 models namely MobileNetV2, Inception V3, and ResNet50 on the medicinal leaf dataset and evaluated their efficiency, and then finally used ensemble learning which is based upon using the weighted averages of the component models. We have divided the paper as follows: Sect2 presents the literature associated with deep learning in the field of automatic detection of medicinal leaves. Section 3 shows the proposed Ensemble deep learning based framework. Section 4 presents the analysis, conclusion, and results. Section 5 presents the experiments and results and Sect6 consists of conclusion and future scope.

2 Related work

The Convolutional Neural Network extracts low-level features from the leaf image, followed by mid-level and then the class-specific features at the final layer [8]. The features learned, reflect not only the shape but also the structural divisions, leaf tip, and base, margin types, etc. The experiments were conducted with whole leaf image and venation patches, concluding that global features were more visible in whole leaf images, while venation patterns formed the basis for differentiation in the patch images. The authors also build a hybrid feature extraction model where the CNN architecture was divided into two networks: a local and global network and then combining features using early and late fusion before classifying using a softmax classifier. Deep learning was used for the classification of three types of legumes namely red bean, soybean, and white beans [9]. The authors extracted a central patch from the vein segmentation obtained from the RGB image of the leaf and then used two experimental setups for species identification.

CNN as a feature extractor, with a machine learning classifier Random Forest or Logistic Regression, was employed to recognize different flower species [10]. The results showed that deep learning proves to be more accurate compared to handcrafted feature extraction methods. In another hybrid approach the authors utilized an autoencoder to achieve weights and then fine-tune another CNN on top of it [11]. The classification was performed by the Support Vector Machine (SVM) layer at the end of the network. Thus, they proved that this combination resulted in better accuracy compared to SVM alone and the combination of SVM and Autoencoder as well as the combination of CNN and SVM. An automatic Identification CNN called the D-Leaf was proposed [12]. For comparison, features were extracted from pre-trained, finetuned AlexNet and the d-leaf and classifiers used were Artificial Neural Network (ANN), SVM, k-NN, Naïve Bayes, and the CNN. It was observed that when the features were extracted from the proposed d-Leaf architecture and the classification was done using ANN, a testing accuracy of 94.88% was obtained. The proposed model was also cross-validated and tested on the freely available datasets and the validation performance of > 93% accuracy was observed.

A valuable contribution to the field of automatic detection of leaves in their natural habitats used deep learning where the authors used the pre-trained model InceptionV2 and incorporated the advantages of Batch Normalization layers in place of convolutional neural layers in the faster region CNN (faster RCNN) architecture which provided multiscale features to the region proposed network (RPN). Final classification was done using softmax and the bounding box regressor algorithm [13].

To discriminate the distinctive features at different depths, Multi-Scale-Fusion CNN (MSF-CNN) technique was adopted where the input image was down-sampled into low-resolution images and fed step by step into the MSF-CNN. The concatenation operation was applied for feature fusion at different scales and the last layer aggregated all the features learned to predict the final plant species [14].

Adoption of Transfer learning made the overall leaf species identification system obtain a close second place in the PlantCLEF 2016 [15]. Pre-trained CNN architectures i.e GoogleNet, AlexNet, and VGGNet were fine-tuned and various hyperparameters were studied using plant task datasets of LifeCLEF 2015. Transfer learning was also employed to create 16 different architecture combinations of pre-trained CNN model MobileNet to classify medicinal leaves [16]. To explore the combination of feature extraction capabilities of pre-trained CNN models and the classification using Logistic Regression, authors successfully classified leaves from the freely available datasets namely Flavia [17] and Leaf Snap [18] datasets into their respective 32 and 184 classes [19]. It shows that despite there being a vast difference in the number of classes, deep learning used in the proposed method showed excellent results. A comparison of six pre-trained CNN models was shown which states their classification capabilities even for small datasets[20].

3 Proposed framework Ensemble Deep Learning- Automatic Medicinal Leaf Identification (EDL-AMLI)

The proposed framework has been illustrated in Fig 1. The medicinal leaf dataset images were first preprocessed, resized, and then divided into training and testing sets. The training set images were used to individually train 3 Convolutional Neural Network models namely MobileNetV2, InceptionV3, and ResNet50 by using transfer learning. These models were loaded without their final layers so we could add our pooling and dense layers to output the species of leaves from the dataset. These models were trained for 100 epochs. The validation accuracies obtained were also listed using threefold and fivefold cross-validation. The ensemble developed on top of the component models uses the weighted average of the individual models to come out with the final prediction. The following subsections explain the Component models, the concept of transfer learning, and Ensemble learning approaches.

Fig. 1
figure 1

The Ensemble Deep Learning- Automatic Medicinal Leaf Identification (EDL-AMLI)

3.1 Component models

Convolutional Neural Network is a Deep-learning based algorithm that is constructed with intuition to mimic the intelligence of the human brain. There is minimum image pre-processing required as compared to other Machine Learning algorithms. CNN's can take an input image and assign weights and biases to various features or objects in the image to distinguish one image from another. These networks are made up of layers where the lower layers are concerned with identifying low-level features and the final layers distinguish the images using high-level features. Visualization of feature maps from the add blocks of MobileNetV2 [21] architecture is depicted in Fig. 2 which shows that different blocks of the model highlight different features. Few layers capture foreground, background, lines, etc. the layers close to the input layer, capture fine details and as the depth of the model increases, the feature maps show less and less detail. CNNs employed in this paper include MobileNetV2, Inception V3 [22], and ResNet [23]. The architecture is explained in detail in the sub-sections below.

Fig. 2
figure 2

Feature Maps from the MobileNetV2 CNN architecture

3.1.1 MobileNetV2

MobileNetV2 was developed with an intuition to design a simple CNN which is lightweight to be easily employed on a mobile device. This model has 53 convolution layers and 1 Avg Pool layer with approximately 350 GFLOP. The main contribution of this model is the inverted residual block and the bottleneck residual block. It also has 2 types of convolutional layers to perform 1 × 1 convolution and 3 × 3 depthwise convolution. Full convolutional operation is divided into 2 layers where the first layer performs depthwise convolution which is responsible for performing lightweight filtering by applying a single conv filter per input channel. The second layer performs 1 × 1 pointwise convolution to build new features by computing linear combinations of input channels. The study further proved that linear bottleneck helps to improve the performance whereas non-linearity destroys the information in low-dimensional space.

3.1.2 InceptionV3

Inception V3 was made by modifying previous Inception architecture. It consumes less computational power and proved to be more efficient in terms of the number of parameters generated by the network and expenses incurred in terms of memory or other resources. The main points in the architecture are factorized convolutions to check network efficiency, smaller convolutions for faster training, asymmetric convolutions, an auxiliary classifier that acts as a regularizer, and reduction in grid size using pooling operations.

3.1.3 Resnet 50

ResNet stands for Residual Networks and comprises 50 layers. It increases the recognition accuracy and overcomes the vanishing gradient or degradation problems of CNN. It introduces an identity mapping concept that provides a shortcut for the gradient to flow if the present layer is not necessary. This also helps to reduce the overfitting problem of the training set. The residual networks help to optimize the deep neural network models.

3.2 Transfer learning

Transfer learning as a machine learning approach has been widely used in the field of image recognition as highlighted in the literature Sect. 2. The purpose of transfer learning is to use existing knowledge to solve a completely new or different problem. The extent to which the features are common between the target and source fields, the easier the knowledge transferring becomes. The main problem it tries to solve is the problem of a limited number of training samples in the target domain which makes it hard for the deep learning algorithm to learn features. It can be further divided into inductive and unsupervised based on whether the samples are marked in source and target fields and whether the tasks are the same. Based on the contents of the transfer learning methods can be divided into feature representation, transfer, instance transfer, parameter transfer, association relationship transfer. Based on the feature space of source and target domains, it can be divided into homogenous and heterogeneous transfer learning.

3.3 Ensemble learning

Ensemble learning enhances the performance of the classifiers. The methods mainly include bagging, boosting, and stacking. An ensemble takes into consideration homogenous or heterogeneous classes. In the former, there is a single base classifier that is trained on different datasets while in the latter, different classifiers are trained on the same dataset. The ensemble then predicts the output based on average, weighted average, and voting on the outputs obtained by the base classifiers. For the automatic detection of medicinal leaves, we have used the heterogenous ensemble approach using weighted averages to obtain the final result.

4 Overview of the EDL-AMLI

The overview of the model is as follows:

  1. (1)

    Medicinal leaf data obtained from Mendeley consists of 30 species of images of medicinal plant leaves. The training set consists of 1547 images and the testing set contains 294 images.

  2. (2)

    The RGB colored images were resized to 224 × 224 pixels.

  3. (3)

    To better understand the performance of the pre-trained models on the dataset, threefold and fivefold cross validation was applied.

    {Med_leaf_training set, Med_leaf_testing set}=3Cross(Med_leaf_dataset)

    {Med_leaf_training set, Med_leaf_testing set}=5Cross(Med_leaf_dataset)

  4. (4)

    Individual classifiers were generated by pretraining the network using transfer learning. MobileNetV2, InceptionV3 and ResNet 50 were used as the base models and the following classifiers were obtained:

    MobileNetV2_softmax= TransferLearning(MobileNet V2, softmax)

    InceptionV3_softmax=TransferLearning(InceptionV3,softmax)

    ResNet50_softmax= TransferLearning(ResNet,softmax)

  5. (5)

    The models obtained in step (4) consists of feature extractors CNN models and fully connected layers consisting of 128 neurons and 30 neurons to output the name of the species.

    MobileNetV2_softmax= TransferLearning(MobileNetV2_softmax, Med_leaf_training set)

    InceptionV3_softmax=TransferLearning(InceptionV3_softmax, Med_leaf_training set)

    ResNet50_softmax= TransferLearning(ResNet50_softmax, Med_leaf_training set)

  6. (6)

    Ensemble Classifier was then used to integrate the outputs from the three individual classifiers using the concept of weighted averages.

Ensemble_learning_clf= Ensemble(MobileNetV2_softmax, InceptionV3_softmax, ResNet50_softmax)

5 Experiments and results

5.1 Experimental environment

All the experiments were performed on i7-11370H @ 3.30 GHz enabled with NVIDIA GeForce RTX 3070 laptop GPU GDDR6 @ 8 GB.

5.2 Evaluation index

The measure of the performance of models used is accuracy.

Accuracy measures the performance of the models. It is the most commonly used evaluation index. The higher the accuracy, the better is the performance of the classifier. The formula to calculate accuracy is shown in Eq. (1):

$$Accuracy=\frac{True Postive+True Negative}{True Positive+True Negative+False Positive+False Negative}$$
(1)

5.3 Data and algorithm simulation experiment and analysis

The medicinal leaf images dataset consisted of clean images divided into 30 classes. Few images from the dataset are shown in Fig. 3. The images were resized and standardized for input into the Convolutional Neural Networks: MobileNetV2, InceptionV3, and ResNet50 for feature extraction and identification, and then final classification was performed using the Ensemble of these deep Network models. As the size of the dataset is small i.e. the training set consists of 1547 images and testing set consists of 294 images, threefold cross-validation and fivefold cross-validation were used to observe the performance of the model. Cross-validation is a widely used technique and in k-fold cross-validation, the dataset is split into 'k' groups and for each unique group being considered as the test set, the remaining groups are taken as the training sets. The model is fit on the training set and evaluated on the test set. The score obtained is retained while the model is discarded. These scores help to estimate the skill of the model. The trained models were saved and then input into the ensemble for classification.

Fig.3
figure 3

Sample leaf images from the Medicinal Leaf Dataset

5.4 Experiment one: MobileNetV2-softmax classifier experiment

In this experiment, the deep learning model uses the pre-trained MobileNetV2 model for feature extraction from the images and then softmax on top is used for classification. The accuracies obtained across the threefolders are shown in Table 1. The total run time recorded was 3248.46. The standard deviation of classification accuracy is 0.8585. The accuracies across the fivefolds are shown in Table 2. Total run time recorded was 5110.82. The standard deviation of classification accuracy is 0.6321. This shows that the algorithm has good fault tolerating capability.

Table 1 MobileNetV2-Softmax classification results (threefold cross-validation)
Table 2 MobileNetV2-Softmax classification results (fivefold cross-validation)

5.5 Experiment two: InceptionV3-softmax classifier experiment

In this experiment, the deep learning model uses the pre-trained InceptionV3 model for feature extraction from the images, and then softmax on top is used for classification. The accuracies obtained across the threefolders are shown in Table 3. The total run time recorded was 4547.006. The standard deviation of classification accuracy is 0.576. The accuracies across the fivefolds are shown in Table 4. The total run time recorded was 5136.526. The standard deviation of classification accuracy is 0.816. Just like the MobileNetV2-softmax classifier, this algorithm also has good stability and is not affected by the samples in the dataset. This classifier also showed better accuracy than the MobileNetV2-softmax in the near similar amount of time taken.

Table 3 InceptionV3-softmax classification results (threefold cross-validation)
Table 4 InceptionV2-Softmax classification results (fivefold cross-validation)

5.6 Experiment three: ResNet50-softmax classifier experiment

In this experiment, the deep learning model uses the pre-trained ResNet-50 model for feature extraction from the images, and then softmax on top is used for classification. The accuracies obtained across the threefolders are shown in Table 5. The total run time recorded was 7747.652. The standard deviation of classification accuracy is 0.338. The accuracies across the fivefolds are shown in Table 6. The total run time recorded was 10408.949. The standard deviation of classification accuracy is 0.545. This classifier recorded the best accuracy of 98.63 in threefolder cross-validation and 98.91 in fivefold cross-validation. Accuracies across the threefolder and fivefolds are observed to be the same which was not the case in the other two classifiers. However, the recorded time was surely more than the other two classifiers. This algorithm is also not sensitive to samples in the dataset and is stable.

Table 5 ResNet50-softmax classification results (threefold cross-validation)
Table 6 ResNet50-Softmax classification results (fivefold cross-validation)

5.7 Experiment four: ensemble deep learning classifier experiment

In this experiment, the individual component classifiers are MobileNet-softmax, InceptionV3-softmax and ResNet50-softmax. Weighted average ensemble allows the models to contribute to the prediction in proportion to their estimated performance. Thus, the contribution of the model depends on the weight assigned to it based on its accuracy percentage. The weights and the accuracy percentage obtained on the test set are shown in Table 7.

Table 7 Performance of the EDL-AMLI classifier on the test set

From the observations, it is clear that the ensemble helps to make the identification of medicinal plant leaf species easier. The component classifiers performed well and the weighted average method allows the models to contribute according to their performance. From Table 6, it is clear that since ResNet50-softmax gave an accuracy of 99.66 on the testing set, its weight was 0.6703, which shows its higher contribution in the prediction, followed by InceptionV3-softmax and MobileNetV2-softmax whose weights were 0.2263 and 0.1033 respectively. The validation of the Ensemble using threefolder and fivefold cross-validation is presented in Tables 8 and 9. The average accuracy of the Ensemble in threefolder cross-validation was 99.94% and time taken was 74.64 s and the average accuracy in fivefold cross-validation was 99.94% and the total time taken was 78.45 s. In both cases, it can be observed that the standard deviation was 0.09815 and 0.112 in the case of threefolder and fivefold cross-validation respectively which shows that the model is robust. The time taken for the ensemble in performing the cross-validation was also significantly lower than the time taken by the individual models which also saves time. The deep learning classifiers have an accurate detection rate and good performance on the testing set. The Ensemble classifier (EDL-AMLI) can improve the classification accuracy over individual classifiers and can be used for the automatic detection of medicinal plants using leaf images.

Table 8 Performance of the EDL-AMLI on the dataset (threefolder Cross-validation)
Table 9 Performance of the EDL-AMLI on the dataset (fivefold Cross-validation)

6 Conclusion and future scope

Automatic detection of medicinal plants opens new doors for the development of medicines to cure diseases that have not yet been cured by allopathy. It will allow the layman to be aware of the plants growing in their surroundings and make utmost use of them to cure common ailments with no possible side effects. Artificial Intelligence makes this purpose even more achievable. We proposed an Ensemble of deep learning models to automatically detect medicinal plants. The medicinal leaf images were obtained from a medicinal leaf dataset published in Mendeley. By employing Transfer learning, three-component classifiers namely: MobileNetV2, InceptionV3, and ResNet50 were used without their top layers so that they could detect the features from the medicinal leaf images and connected to Dense Layers to be trained on the medicinal leaf dataset of 30 classes using softmax classifier. The component classifier's performance was validated by performing threefold and fivefold cross-validation for 100 epochs to understand their behavior on the medicinal plant leaf dataset. The time taken, accuracies obtained, and the standard deviation was recorded. The component models were then trained on the training set and saved to the disk. Final classification was performed by using EDL-AMLI classifier using the weighted average of the individual classifiers. The performance on the test set and the cross-validation using three and fivefolds was recorded and it was observed that the Ensemble learning classifier EDL-AMLI performs better than the individual base classifiers and to the Novel Approach [24] where the authors used gradient boosting technique to improve the performance of CNN deep learning models for classification of plants into their species. An accuracy of 99.6% was observed which shows its potential in automatically detecting medicinal plant leaf images.

In the future, we plan to create our own dataset to contribute to the research community. We also plan to work on the ensemble of the machine learning models and test the approach on various other leaf image datasets [25].