1 Introduction

In recent years, Machine Learning (ML) has been emerging as a game changer in multiple aspects of life. In agriculture, machine learning has been widely used as an effective means of production, including but not limited to automatic harvesting machines, production estimation, pest control, weeds control, irrigation control, plant pathology (leaf disease classification), and fruit classification. Generally, diseases of a plant can react in different parts, such as its leaves, flowers and roots. Among them, plants’ leaf is one of the most dominant and pronounced parts. Because leaves can participate in providing the nutrients the plant needs to grow, which is the photosynthesis in leaves produces the chlorophyll from sunlight Chouhan et al. (2020). Some disease of leaves may cause their drop or wither, directly affecting the plant’s yield and even survival. Furthermore, it will bring negative impacts, leading to crop productivity decrease, and production costs rise. In the past, farms generally rely on labour and experts for routine inspections and disease management. Their disadvantages are obvious. First, lots of manpower and costs are required. Second, labours need training and easily get fatigued on manual jobs. Third, it is difficult to detect leaf disease timely and on a large scale. Forth, diagnosis may be subjective due to human errors and bias. Thus, an effective leaf disease classification approach is the most basic need for plant cultivation. Fortunately, ML approaches have been recently emerging as a better solution compared to traditional methods, showing their effectiveness and ease of use in plant leaf pathology classification through plant leaf image analysing. Plant leaf images have several advantages. Datasets of leaves are relatively easy to collect, analyse and reproduce (e.g., using a camera). We can also extract useful features (e.g., species, healthy states, age, and disease categories), which would improve the quality and quantity of agricultural production. Therefore, efficient and timely identification and classification of plant diseases will be the key to remedying the loss of production. Nowadays, with the introduction of precision agriculture (PA) or smart agriculture (SA) Vijaykanth Reddy and Sashi Rekha (2021); Gajjar et al. (2021); Chouhan et al. (2021); Mureşan et al. (2020); Chouhan et al. (2020); Bangari et al. (2022), ML technologies were researched and employed, especially in plant leaf pathology classification. Combine with Big Data and Internet of Things (IoT), ML can automatically detect plant leaf diseases as early as possible. Currently, the applications of ML have been deployed in various hardware and software, e.g., mobile phone applications Paymode et al. (2021), websites Wadhawan et al. (2020) and smart glasses Ponnusamy et al. (2020). With the increasing demand of ML in smart agriculture, a comprehensive survey on leaf disease classification will be beneficial to interested researchers and farmers. This paper would provide the research and industry communities with useful information on the available data and techniques, their advantages and weakness, and their applicability.

In recent years, there has been a growing interest in utilising machine learning for leaf disease classification. Several surveys have been conducted on this research topic; however, we have identified certain limitations within the reviewed works. The scope of the reviewed papers was often narrow, failing to encompass the broader concept of machine learning in leaf disease classification. Additionally, many of the reviewed papers were outdated, indicating a need for more up-to-date research in this area. Furthermore, a comprehensive review of available datasets for leaf disease classification is still lacking. It is also necessary to conduct a thorough review of the various machine learning approaches that have been employed. Currently, recent surveys have predominantly focused on emerging deep learning techniques, such as Convolutional Neural Networks (CNN). However, due to the diverse techniques and datasets used in each survey, it remains challenging to analyze and compare research outcomes. Moreover, while numerous software applications of machine learning for pathology, including leaf-disease analysis, have been developed recently, there is a lack of comprehensive review in this specific domain.

This paper will provide a comprehensive view of current achievements and trends in the application of ML for leaf disease classification. Currently, leaf disease classification approaches can be categorised into traditional (shallow) ML, Deep Learning (DL) and Augmented Learning (AL). DL is a branch of ML and AL is a research topic, aiming to improve the effectiveness and usefulness of ML approaches. In shallow learning, feature extraction plays an important role which, in many cases, requires experts’ involvement, i.e. to engineer useful features. Deep learning, on the other hand, may reduce the cost of feature engineering as it can facilitate effective learning over a large amount of data. Although, data-hungry sometimes is an issue in deep learning, leaf images are sometimes easy to collect and farmers can help with disease annotation. However, to reduce the reliance on the labelled data, data augmentation methods have been taken to produce more training data and enhance the model robustness. Transfer learning is also a promising approach for this task, as it can reduce the need for leaf data by utilising pre-trained models from other tasks. As we can see, the keys to the success of ML approaches are the quality and quantity of data. Therefore, different from the other previous surveys, we discuss the availability and quality of public datasets and their suitability for evaluating ML models.

The organisation of the paper is as follows. In the next section, we will explain how we collect and analyse related literature. Section 3 will discuss the gaps in existing review and survey papers. After that, Sect. 4 presents the available public datasets for leaf disease classification. This would help researchers to find, apply, and evaluate their ideas quickly. In Sect. 5, we categorise and compare machine learning approaches, by dividing them into three main groups: traditional (shallow) ML approaches, deep learning (DL), and transfer learning (TL). In Sect. 6, we present related applications available for leaf disease classification in real-life. Finally, Sect. 7 will summarise our findings and discuss the potential directions for future work on this research topic. This paper aims to provide some useful resources for the study and application of leaf disease classification with machine learning.

2 Methodology

This study was researched through a series of well-known databases, including EBSCO host, Scopus and Google Scholar. The search keywords were including “leaf disease”, “plant disease”, “machine learning”, “deep learning”, “classification”, “detection”, etc. In this review, we firstly focus on quality papers by filtering them using 3 metrics: (1) number of citations; (2) rank of the published venues (Q1 for journals, and rank CORE A/A* for conferences); and (3) relevance. In addition, beyond the criteria, we also studied as many relevant articles as we could find to avoid the issue of omission. As shown in Table 1 and Fig. 1, the academic articles referenced mainly focus on the recent years (2015 –2022). In Table 1, the review papers are denoted with asterisks. Out of the total papers published from 2020 to 2022, there are 15 review papers and 71 technical papers. In Fig. 1, the amount of papers shows an increasing trend year by year, which reflects the growing interest in plant leaf detection and classification. As we can see, the number of papers increases substantially in recent years, showing a growing interest in this topic.

Table 1 The publication years of Referenced Academic Articles
Fig. 1
figure 1

The amount and years of referenced articles. The red color indicates the number of review paper on leaf disease classification. (Color figure online)

3 Related work

Table 2 Recent review papers

As the interest in leaf disease classification with machine learning has been increasing recently, there are several surveys related to this research topic. In this section, we analyse recent review papers about leaf disease classification or classification. Table 2 shows their study and the gaps they left behind. As we can see, the previous surveys focused on different aspects of leaf disease classification, shedding light on some key areas in the research topic but a comprehensive study is still missing.

First, we found that many related works have a shadow scope for their study. The number of papers for review is not adequate to cover the broad concept of ML in leaf disease and many papers used in the reviews are not up-to-date. For example, In Raina and Gupta (2021); Ekanayake and Nawarathna (2021), no more than 20 articles are selected from Google Scholar for their study. Another survey paper Nisar et al. (2020), published in 2020, analyse articles all before 2017. Mureşan et al. (2020) analysed 26 academic papers about leaf disease detection and classification from 2015 to 2020. Applalanaidu and Kumaravelan (2021) surveyed more than 45 academic papers about plant disease detection and classification from 2017 to 2020. Agarwal et al. (2021) has 12 papers focusing on deep learning techniques only. In Kumar et al. (2022), they review shallow ML (10 articles) and DL (20 articles, including TL). Metre and Sawarkar (2022) surveyed about image processing with ML (3 articles), DL (5 articles) and SI (5 articles). Bangari et al. (2022) just includes 8 articles about the potato leaf disease classification results. In a recent survey Bhagat (2022), 179 papers have been studied, however, there are only 12 articles are from recent years (2020–2022) and not all of them are about leaf disease classification (the survey also covers plant species classification). Different from it, our paper focuses on more recent studies.

Second, we found that a comprehensive review about the available datasets of leaf disease classification is still missing. Many researchers already noticed that the primary obstacle in this research topic is the availability of datasets Chouhan et al. (2020); Kumar et al. (2022); Li et al. (2021); Agarwal et al. (2021). For example, Lu and Young (2020) surveyed 34 agricultural datasets, however, there is only one dataset, the Maize Leaf (NLB) Wiesner-Hanks et al. (2018), which is related to leaf diseases. Unfortunately, many datasets introduced in related work listed here are private Chouhan et al. (2020); Mureşan et al. (2020); Kumar et al. (2022); Agarwal et al. (2021). Plant Village is one of the most popular public datasets Raina and Gupta (2021); Ekanayake and Nawarathna (2021); Agarwal et al. (2021); Kumar et al. (2022); Metre and Sawarkar (2022); Bangari et al. (2022); Agarwal et al. (2021); Bhagat (2022). This dataset is useful for the scientific research purpose, however, there are some pitfalls due to its laboratory-condition. In, Kumar et al. (2022); Agarwal et al. (2021); Metre and Sawarkar (2022), the authors expressed the importance of real-field datasets. In another research, a combination of public (55% based on Plant Village) and private data (25% ) is used Bhagat (2022). Recently, more calls on the availability of leaf disease data to bring greater benefits to both scientific and industrial communities Agarwal et al. (2021).

Third, there are many different machine-learning approaches, and they need to be reviewed thoroughly. Early survey studies focus on traditional (shallow) approaches such as Artificial Neural Networks (ANN), Support Vector Machine (SVM), AdaBoost, KNN, Decision Tree, Naïve Bayes (NB) Raina and Gupta (2021); Nisar et al. (2020); Applalanaidu and Kumaravelan (2021); Bangari et al. (2022); Metre and Sawarkar (2021); Bhagat (2022). In these approaches, data pre-processing and feature engineering are usually needed Raina and Gupta (2021); Li et al. (2021); Nisar et al. (2020); Metre and Sawarkar (2021). Feature engineering is an important step to extract the features of images as inputs for ML models Applalanaidu and Kumaravelan (2021). Normally, hand-crafted features will be extracted which requires the involvement of humans, i.e. domain experts to define useful features. For feature extraction, there exists a wide range of methods, including Local Binary Patterns (LBPs) Histogram, Speeded Up Robust Features (SURF), Scale Invariant and Feature Transformation (SIFT), Gabor Energy Filtering, Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Generalized Extreme Value (GEV) Distribution and Johnson SB Distribution Bhagat (2022).

Recent surveys have revolved around new techniques, including deep learning, such as, CNN Divya et al. (2021); Applalanaidu and Kumaravelan (2021); Bangari et al. (2022); Agarwal et al. (2021); Bhagat (2022), AlexNet, GoogLeNet, and VGGNet Applalanaidu and Kumaravelan (2021); Bangari et al. (2022); Bhagat (2022), Pooling Dilated CNNs Raina and Gupta (2021). Recently, traditional (shallow) approach has been replaced by deep learning methods Geetharamani and Pandian (2019), as it may cause side effects ( Sharma et al. (2021); Bir et al. (2020)) due to human errors/biases during feature engineering step. A number of experimental results showed that DL is a powerful and useful way to detect and classify leaf diseases Mureşan et al. (2020); Li et al. (2021); Raina and Gupta (2021); Bangari et al. (2022); Divya et al. (2021); Applalanaidu and Kumaravelan (2021). DL technologies are relatively user-friendly, can extract image features and classify plant diseases automatically Li et al. (2021). For example, the higher accuracy of DL compared to the traditional (shallow) approach was demonstrated by Kumar et al. (2022). They found that DL models, with and without pre-training, achieved average accuracies of 99.64% and 98.64% respectively, surpassing the 95.71% accuracy of the traditional approach. For improvement, recent studies enhance the performance of machine learning models, especially deep learning, with supplementary techniques, such as segmentation Kumar et al. (2022); Metre and Sawarkar (2022); Bhagat (2022), data augmentation Li et al. (2021), and transfer learning Li et al. (2021); Divya et al. (2021); Agarwal et al. (2021), or combination of traditional and deep learning Applalanaidu and Kumaravelan (2021). Li et al. (2021) claimed that transfer learning would be the most effective method to boost the robustness of CNN classifiers. Applalanaidu and Kumaravelan (2021) employed a combination of different segmentation algorithms to extract better features of the images.

As we can see, each survey focuses on a different set of techniques and data based on various timelines. This makes it difficult to analyse and compare the research outcomes. Moreover, many software applications of ML for pathology, including leaf-disease analysis, have been developed recently and there is a lack of a review in this aspect. In this paper, we will address the limitations above by providing a comprehensive review of recent studies, public datasets, machine learning techniques, and real-life applications of machine learning in leaf disease classification.

4 Datasets

Data plays a critical role in modern AI, especially in the emergence of deep learning techniques recently. The quantity and quality of training data will improve the performance of large models used in deep learning Goodfellow et al. (2016). In research and practice, the role of image datasets for computational vision tasks is self-evident. In Chouhan et al. (2020), a study showed that the foremost challenge for research is the lack of available datasets. For leaf disease classification, in recent years, many researchers have devoted themselves to the collection of plant disease data for public use. Table 3 and Fig. 2 show recent available public datasets about plant leaf diseases for computer vision research. In the table, the “Year” column represents the published year of a dataset. “Species” shows the number of plant species. The “Diseases” column lists the number of unique diseases. We also include a “Class” column to show the number of original classes in the dataset, as some datasets combine species and diseases as labels. We categorise the datasets into a multi-species group and a single-species group according to their species diversity.

Table 3 Public leaf disease datasets
Fig. 2
figure 2

The structure of public datasets

4.1 Single-species datasets

A single-species dataset is specific to one plant species. It can be used in the detection, classification or severity assessment of a specialised plant.

4.1.1 Plant pathology 2021 - FGVC8 dataset

Plant Pathology 2021-FGVC8 is an apple leaf disease image dataset of a Kaggle challenge competition. It is a part of the Fine-Grained Visual Categorization FGVC8 workshop at the Computer Vision and Pattern Recognition Conference (CVPR) 2021. This dataset is characterised by each leaf having 1 or several labels. It contains around 23,000 apple images, and six apple leaf health categories: “healthy”, “complex”, “rust”, “frog eye leaf spot”, “powdery mildew”, and “scab”. Among them, “complex” means a leaf is unhealthy but we are unable to identify an exact cause (disease). This dataset would be useful for multi-class apple leaf disease classification.

4.1.2 Maize leaf (NLB) dataset

The Maize Leaf (NLB) Dataset was collected through various shooting methods proposed by Wiesner-Hanks et al. (2018). This includes hand cameras, cameras on a 5 m boom, and cameras on a drone. The dataset has more than 18, 222 maize plant images with 105, 735 Northern leaf blight (NLB) lesions annotated by experts. This dataset is considered as the largest open dataset of single plant species at present, and will be helpful for maize disease classification and severity assessment.

4.1.3 Citrus dataset

The Citrus dataset has two folders, 150 images of citrus fruits and 609 images of citrus Leaves, each folder has 5 categories (black spot, canker, greening, melanose, and healthy). All images were annotated by experts.

4.1.4 Rice diseases image dataset

Rice diseases image dataset has four categories of rice leaves: Brown Spot (523 images), Healthy (1488 images), Hispa (565 images) and Leaf Blast (779 images). The dataset has been studied in several works Kathiresan et al. (2021); Bifta Sama et al. (2021) for leaf disease classification.

4.1.5 JMuBEN datasets (JMuBEN, JMuBEN2, JMuBEN3)

This is a group of datasets (JMuBEN, JMuBEN2, JMuBEN3) that were released by the same authors Jepkoech et al. (2021) and were all collected by a camera under plant pathologists’ guide. JMuBEN and JMuBEN2 are about Arabica coffee leaves that were taken from real coffee plantations. They can be combined into a larger dataset. JMuBEN has three categories: 7682 Cerscospora images, 8337 rust images and 6572 Phoma images. JMuBEN2 has two categories: 16,979 Miner images and 18,985 healthy images. JMuBEN3 is about sweet potato leaves which are all affected by leaf rust. It just has one category: 1383 Sweet potato leaf rust images. The JMuBEN3 dataset folder also contains a sweet potato leaf rust classification model code by the authors. Some images of JMuBEN and JMuBEN2 were augmented by rotation and flipping methods to increase dataset size and prevent the over-fitting issues Jepkoech et al. (2021). These datasets are useful for deep learning research and study.

4.1.6 Cassava disease dataset

Cassava disease dataset is from a Kaggle challenge competition as a part of the Fine-Grained Visual Categorization workshop (FGVC6) at CVPR 2019. It contains 1 healthy and 4 disease categories which are Cassava Brown Streak Disease (CBSD), Cassava Mosaic Disease (CMD), Cassava Bacterial Blight (CBB) and Cassava Green Mite (CGM). All images were collected by 200 farmers through small phones and annotated the labels by experts. The dataset has two parts, one is a training set (9436 annotated images) and another is a test set (12,595 unlabeled images). In the dataset, the experts also scored the disease severity (from 1 to 5), however, the Kaggle did not include the scores Mwebaze et al. (2019).

4.1.7 UCI rice leaf diseases dataset

UCI Rice Leaf diseases dataset aims to use for rice plant diseases detection and classification Prajapati et al. (2017). It has three disease categories: Bacterial leaf blight, Brown spot, and Leaf smut, and each category has 40 images. The limitation of it is the size is too small (120 images total). This can be useful for prototyping machine learning methods for quick testing but may not be suitable for deep learning approaches which require large amounts of data.

4.2 Multi-species datasets

A multi-species dataset is composed of a variety of plant species, each has its own (overlapping) set of diseases. The datasets in this group can be used for the classification of species and classification of diseases.

4.2.1 Plant Village dataset

Plant Village Dataset is currently the most widely used and popular public dataset for leaf disease classification. It has two versions, an original version and a data augmentation version. The original dataset was published in 2016 by Hughes and Salathe (2016) with 54,305 leaf diseases or healthy images from 14 plant species (e.g., Apple, Blueberry, Cherry and Corn). Each species has 1–10 classes of related diseases or healthy (22 unique disease categories total). In the dataset folder, it has a total of 38 classes that combined species and diseases (e.g., Apple black rot), and one additional category of about 1143 background images (without leaf). The data augmentation version was released in 2019 by Hughes and Salath’e (2015), they used six data augmentation methods ( i.e. image flipping, Gamma correction, noise injection, PCA colour augmentation, rotation, and Scaling) to enhance the data. As a result, the original dataset had been increased from 54,305 to 61,486 images.

4.2.2 Plant leaves dataset

Plant Leaves dataset consists of 4502 images of healthy and unhealthy leaves divided into 22 categories by species and state of health. The images are in high-resolution JPG format. 12 tree types are AlstoniaScholaris, Arjun, Bael, Basil, Chinar, Gauva, Jamun, Jatropha, Lemon, Mango, Pomegranate, and PongamiaPinnata. Notice that the Bael class only has diseased leaves and Basil only has healthy leaves.

4.2.3 Plantae_K dataset

Plantae_K dataset contains 2153 images of healthy and unhealthy plant leaves, divided into 16 categories by species and state of health (e.g., apple healthy and apple diseased). The images are in high-resolution JPG format. There are 8 fruit types in this dataset, including Apple, Apricot, Cherry, Cranberry, Grapes, Peach, Pear and Walnut.

4.2.4 PlantDoc dataset

Compared to Plant Village Dataset, the PlantDoc dataset aims to establish a real-field images dataset. Singh et al. (2020a) concerned that the images of Plant Village (e.g., see Fig. 3a) were all taken in laboratory setups and not in the real conditions of cultivation fields. This would impact the trained model’s efficacy and real-life applications. Based on that, they built the PlantDoc dataset, which can be a sufficiently large-scale non-lab dataset for leaf disease classification. The images in PlantDoc have cluttered backgrounds and are without a standard format. A comparison between Plant Village images and PlantDoc images can be seen in Fig. 3. PlantDoc has similar categories to Plant Village with 2598 leaf images from 13 plant species. In this dataset, there are 17 unique disease categories and 38 classes for the combination of species and diseases (e.g., Apple Scab Leaf). The images were annotated by experts.

Fig. 3
figure 3

Apple scab leaf samples

Take-home messages

1. Maize Leaf (NLB) dataset is the largest public dataset while Plant Village is the most popular dataset.

2. Plant Village, Plant Leaves and Plantae K are laboratory datasets which can be useful for prototyping and evaluating machine learning models. However, real-field datasets would provide a more comprehensive evaluation and support for realistic applications.

3. We found that the available datasets are very useful for domain-adaptation and multi-task learning, however, this is largely missing in the current literature. We would suggest a machine learning model to learn from different datasets in a compositional manner where the model can effectively adapt to new tasks/datasets added in.

5 Machine learning approaches

Generally, there are currently three general directions for machine learning approaches for leaf disease classification (see Fig. 4), including shallow learning (SL), deep learning (DL), and augmented learning (AL). In shallow learning approaches, leaf localisation always was done first, then based on the diseased leaves to classify the diseases. In addition, feature extraction is the necessary step of shallow learning to extract the features of leaves before classification. Deep learning has been emerging as a great tool for leaf disease classification recently thanks to its ability to offer an end-to-end process for learning and prediction. Deep learning does not require the feature engineering step and is able to learn an effective classifier from input images. At present, the advantages and disadvantages of shallow learning and deep learning approaches are still inconclusive. However, there is a strong agreement that SL has disadvantages in leaf image classification tasks, such as the inability to apply to large datasets, complex processing pipelines, and especially the need for feature extraction Applalanaidu and Kumaravelan (2021); Li et al. (2021). DL, however, also has two main disadvantages: computationally expensive and data-hungry. With the development of related hardware and computing systems, the computation expensiveness of DL has been alleviated. For the data hungriness issue, recent approaches employ augmented learning techniques by generating artificial data and/or reusing pre-trained models from other domains/tasks.

Fig. 4
figure 4

ML development in leaf disease classification

Table 4 Machine learning technologies

5.1 Shallow learning

Table 4 summarises the details of this study through shallow machine learning approaches (if there is a comparison, the highest accuracy is in bold). We focus on the recent and notable papers from 2019. The general stages for leaf disease identification and classifications using shallow learning include: data(image) acquisition, processing, segmentation (possibly Metre and Sawarkar (2022, 2021)), feature extraction, and identification (or classification) Raina and Gupta (2021); Li et al. (2021); Metre and Sawarkar (2022, 2021). While data acquisition, processing, and segmentation are common in image processing generally, in this section we discuss two aspects that directly affect the quality of leaf disease classification.

5.1.1 Feature engineering

Normally, data was collected from digital cameras (sometimes specialised cameras are used) to obtain basic features in colour models, such as RGB Chaudhari and Patil (2020), HSV Kirti Rajpal (2020); Mukhopadhyay et al. (2021), and CIELAB Chaudhari and Patil (2020); Kirti Rajpal (2020). Among the three colour models, HSV is more popular than the others. For example, Chaudhari and Patil (2020) collected 618 images from farms in RGB format before being converted to CIELAB colour space and resized to 400 × 600 pixels. In Kirti Rajpal (2020), the authors used two colour models (HSV and CIELAB) for Plant Village data to perform the segmentation for feature extraction. Mukhopadhyay et al. (2021) collected 312 samples of tea leaves from three Indian tea gardens and convert them from RGB format to HSV for data pre-processing. From a colour model, we can extract more task-related features based on the spatial structure of the image data. The two most common methods for feature extraction are K-means clustering Kirti Rajpal (2020); Padol and Yadav (2016); Kumar et al. (2020); Chaudhari and Patil (2020) and grey-level co-occurrence matrix (GLCM) Bharate and Shirdhonkar (2020); Tulshan and Raul (2019); Dang-Ngoc et al. (2021); Kumar et al. (2020); Shahidur Harun Rumy et al. (2021). From the literature, we found that GLCM features achieves better performance than K-Means features. Other extraction methods from image processing are employed as well. In Mukhopadhyay et al. (2021), the authors used Non-dominated Sorting Genetic Algorithm (NSGA-II) to detect the tea leaf’s disease area and then applied Principal Component Analysis (PCA) to extract 5 most significant features for classification. In Singh et al. (2020b) features are extracted from RoI (Region of Interest) segmentation. In Das et al. (2020), Gaussian blur and Haralick’s algorithm are applied to extract 60 texture features. A comparison of different feature extractors was presented in Gadade and Kirange (2020). In this study, 9 different feature extraction methods are used, including Colour Mean Pixel Value, Colour moments, Edge Feature extraction using the Pewit operator, Gabor features extraction, Histogram features extraction, Haar features, Histogram of Oriented Gradients (HOG), and Local Binary Patterns (LBP). Among them, HOG features perform the best. Besides standard approaches in image processing, a novel feature extraction method based on Local Binary Patterns (LBP), dedicated to leaf diseases, was proposed in Barburiceanu et al. (2020). This paper claimed that compared to recent grayscale LBP-based approaches, the new feature extraction method improved accuracy, precision and recall significantly.

Combination of features extracted from different techniques. Jayaprakash and Balamurugan (2021) pre-processed all tomato leaf images through the Gaussian filtering (GF) technique first. After that, they tried to combine two feature extractors which are local binary patterns (LBP) and Scale Invariant Feature Transform (SIFT)

5.1.2 Classifiers

SVM was the most common ML classifier to classify the leaf diseases Kirti Rajpal (2020); Barburiceanu et al. (2020); Singh et al. (2020); Das et al. (2020); Gadade and Kirange (2020); Shahidur Harun Rumy et al. (2021); Padol and Yadav (2016); Bharate and Shirdhonkar (2020); Dang-Ngoc et al. (2021); Kumar et al. (2020); Chaudhari and Patil (2020); Mukhopadhyay et al. (2021). Padol and Yadav (2016) used Linear SVM to detect the grape leaf disease, achieving 88.98% accuracy. However, the linear kernel only works well if the data is linearly separated, which is not the case in many applications. In Kirti Rajpal (2020), a study compared three different kernels of SVM (Linear, Polynomial, RBF) on HSV and CIELAB features for Black rot disease classification in grape plan. The result showed that a SVM model with RBF Kernel gained the best accuracy of 94.1%. SVM was reported to be applied successfully to Banana leaf (85% average accuracy) Chaudhari and Patil (2020), tea leaves Mukhopadhyay et al. (2021) (83% average accuracy, 78% F1-score), grape vine disease (97.2% average accuracy) Singh et al. (2020b). A comparison between SVM and Logistic Regression has been studied in Das et al. (2020) for tomato leave disease classification. The results showed that SVM significantly outperforms Logistic Regression (20% better accuracy) and Random Forest (17% better accuracy). In Gadade and Kirange (2020) a more comprehensive comparison has been carried out with 4 competitors (Linear Regression, KNN, SVM, Naïve Bayes and Decision Tree) using 9 different types of features. It also concluded that SVM performs the best on tomato leaf disease diagnosis and severity measurement. A new SVM model was proposed in Dang-Ngoc et al. (2021), known as hierarchical SVM, to detect citrus leaf diseases where hierarchical SVM achieved 91.76% accuracy in comparison to 88.24% from traditional SVMs.

Besides SVM, other classifiers can achieve high performance if suitable features are selected. For example, in a small private dataset, the performance of K-Nearest Neighbor (KNN) is 98.56%, which is better than 97.6% from SVM Tulshan and Raul (2019). In Bharate and Shirdhonkar (2020), KNN outperforms SVM when using GLCM features for grape leaf images, achieving 96.66% in comparison to 90% from the latter. For rice leaf disease classification Shahidur Harun Rumy et al. (2021), six ML algorithms, including RF, Naïve Bayes, Decision Trees, Logistic Regression, KNN and SVM, are compared. The feature set is a combination of Color Histogram, Hu Moments shape features, and Haralick texture features, which enabled RF to achieve the best performance (97.50% accuracy) on an IoT device (Raspberry Pi). Jayaprakash and Balamurugan (2021) pre-processed all tomato leaf images through the Gaussian filtering (GF) technique firstly. After that, they tried to combine two feature extractors which are local binary patterns (LBP) and Scale Invariant Feature Transform (SIFT) with two ML classifiers which are multilayer perceptron (MLP) and random forest (RF) models to classifier the tomato diseases. They measured the accuracy results of each feature extractor with each classifier, which are SIFT & MLP 92.40%, SIFT & RF 91.20%, LBP & MLP 90.40% and LBP & RF 89.30%. Decision Tree is a simple classifier and can be useful for small datasets with a small number of classes Rajesh et al. (2020). Here, the paper shows that after relabelling the classes from four diseases and 1 healthy label to be a binary class, containingg ‘healthy’ and ‘unhealthy’ labels, Decision Tree can achieve 96% accuracy.

Take-home messages

1. Shallow machine learning requires feature extraction from images Applalanaidu and Kumaravelan (2021) to be useful for the disease classification task. The two most common methods are K-means clustering and grey-level co-occurrence matrix (GLCM), in which GLCM is more recommended. A combination of features is also encouraged, as it can help improve performance.

2. Support vector machine (SVM) was the most common ML method for leaf disease classification. It is very suitable for both smaller (more likely to be linear) or non-linear datasets Thet et al. (2020). Its better performance in comparison to other classifiers is evident in several studies. However, if suitable features are selected, KNN or RF can achieve better accuracy.

3. For small datasets with a small set of disease classes, simple methods can achieve good results.

5.2 Deep learning

Deep learning is a rising branch of machine learning which consists of different architectures and associated learning algorithms. For leaf disease classification, most deep learning models and algorithms are based on neural networks with many number of hidden layers. We categorise deep learning approaches for this task into deep neural networks, convolutional neural networks for image classification, and convolutional neural networks for object detection& classification. Table 5 provides a summary of recent Deep Learning approaches for leaf disease classification (if there is a comparison, the highest accuracy is in bold).

Table 5 Summary of deep learning approaches

5.2.1 Deep neural nets

Deep neural networks are neural networks with multiple hidden layers, one on top of another. Previously, training such deep structures is difficult due to the problem of gradient vanishing/exploding but current learning techniques can turn that cure into a blessing, thanks to the availability of big data and powerful computing resources. We can use deep neural nets as a classifier, similar to shallow learning approaches. In Jana et al. (2021), deep Belief networks (DBN) were studied, together with other variants of multi-layer feedforward neural networks, for pepper leaf disease classification. The models were evaluated on two datasets. The first dataset is self-collected, consisting of 1500 images of healthy and diseased leaves. The other dataset contains 300 healthy and 35 diseased images from Plant Village. All samples are resized to 256 × 256 pixels. The features used in this study was Gray Level Co-occurrence Matrix (GLCM). The average accuracy and F1-score of DBN are 91.956% and 0.77546, respectively. The results are slightly better performance.

The employment of feature engineering in deep learning seems not useful, as deep models themselves are effective feature extractors. Instead of two stages (feature extraction + classification) deep convolutional neural networks (CNN) can learn discriminative features that are useful for classification in an end-to-end manner.

5.2.2 Image classification CNNs

CNN is a class of neural networks where spatial information from image structure are represented and learned through convolution operations. CNNs have been used largely in image processing and computer vision, especially in classifying images, and therefore have been useful for leaf disease classification as well.

5.2.2.1 Off-the-self CNNs

There are a plethora of convolutional neural networks developed to tackle a wide range of problems in image classification. Ones can easily pick up a model and apply it to classify disease from leaf images.

5.2.2.2 LeNet and GoogLeNet

LeNet Lecun et al. (1998) is one of the earliest convolution CNNs, although it does not have a very deep architecture, its convolution idea is the inspiration for many other deep CNNs models nowadays. In Kawatra et al. (2020), LeNet achieved the lowest accuracy (94.0%) compared to other approaches on Plant Village. A newer version, called GoogLeNet (also known as Inception V1), was developed with improvements from LeNet with several novel components added, such as batch normalization, image distortions, and more layers. In Vijaykanth Reddy and Sashi Rekha (2021) GoogLeNet achieved 95.69% accuracy and ranked 3rd in 7 CNN models for Apple disease classification. In Zhang et al. (2018) it achieved 98.9% accuracy for the classification of Maize leaf diseases.

5.2.2.3 AlexNet

As one of the earliest deep CNN models, AlexNet has been employed in multiple studies of leaf disease classification Geetharamani and Pandian (2019); Ashok et al. (2020); Agarwal et al. (2019); Anandhakrishnan and Jaisakthi (2020); Huang et al. (2020); Kawatra et al. (2020). For tomato diseases, AlexNet achieved promising results, such as 95.75% accuracy in Ashok et al. (2020) and 90.1% accuracy Anandhakrishnan and Jaisakthi (2020) (they used different testing partitions). AlexNet was also reported to have 86.5% accuracy for grape diseases in Agarwal et al. (2019). Although AlexNet was a popular model, its performance was usually inferior compared to other deep CNNs. For improvement, Kawatra et al. (2020) proposed a hybrid approach by combining AlexNet and Linear SVM to boost the accuracy to 99.98% on the Plant Village dataset. This is significantly better than AlexNet alone (94.3%), ResNet50 (98.06%), VGG-16 (98.76%), and Inception V3 (99.08%).

5.2.2.4 VGG

Very Deep Convolutional Networks, known as VGG or VGGNet, is an idea of how to effectively increase the depth of CNNs. VGG-16 (VGG with 16 layers) has been applied to tomato leaves datasets Agarwal et al. (2020); Anandhakrishnan and Jaisakthi (2020). In Agarwal et al. (2020) a pre-trained model was used to achieve \(77.2\%\). In Anandhakrishnan and Jaisakthi (2020) a better training approach was proposed where the performance was much higher with 90.1% accuracy. A deeper version of VGG, VGG-19, was employed in Bir et al. (2020) to successfully classify tomato leaf diseases with 96.86% accuracy. In Hu et al. (2021) used VGG-16 to do the severity analysis The proposed model gained 91.22% Accuracy. Sujatha et al. (2021) applied VGG-16 and VGG-19 on a citrus leaf disease dataset. Notably, VGG-16 has been applied widely to grape leaf images Agarwal et al. (2019); Huang et al. (2020); Thet et al. (2020). Thet et al. (2020) tested VGG-16 on their private grape leaf diseases dataset (5 leaf diseases and 1 healthy category, 6000 images). Some modifications of VGG16 have been developed by replacing two last two fully connected layers with the Global Average Pooling layer. The results showed that the proposed has the best accuracy (98.4%), significantly better than normal VGG-16 and the combination of VGG-16 and SVM classifier.

5.2.2.5 Inception

Inception is a class of CNNs that utilises Inception modules for deeper structure with more efficient computation. In leaf disease classification, Inception V3 was the most popular among different versions of Inception networks. It was employed for tomato leaf diseases Agarwal et al. (2020). In Krishnamoorthy and Parameswari (2021) Inception V3 achieved 95.41% on a rice diseases image dataset, better than VGG-16 and RestNet-50. For the benchmark Plant Village dataset, InceptionV3 was reported to receive 98.42% Hassan et al. (2021), and 99.74%, Sai (2021). Again, they have different results because of the different partitions for training, validation, and test.

5.2.2.6 ResNet

Among many deep CNN models, ResNet is a powerful structure where we can train the model with a lot of layers to gain performance superiority. ResNet-50 achieved 98.40% accuracy for tomato leaves Anandhakrishnan and Jaisakthi (2020). Guan (2021) applied ResNet to achieve 82.78% in modified Plant Village. For Betelvine leaf disease, Kumar et al. (2020a) showed that ResNet-34 outperformed other models with 99.40% accuracy & 0.9651 F1-score. These are much better than SVM (50.69% & 50.57%), Decision Tree (72.23% & 72.02%), Logistic Regression (80.99% & 80.88%) and K-NN (87.86% & 88.06%). Another version, ResNet-20, achieved 92.76% on apple leaf images Vijaykanth Reddy and Sashi Rekha (2021). Recent works integrate the idea of residual blocks in ResNet and Inception module Hassan et al. (2021) to create InceptionResNetV2. Such a combination increases the performance from 98.42% to 99.11% on the Plant Village dataset.

5.2.2.7 MobileNet and EfficientNet

Besides very deep models as we discussed above, some compact architectures were also employed, thanks to the increasing demand for IoT and hardware devices in plant pathology. For example, MobileNet can predict grape leaf diseases with 86% accuracy Huang et al. (2020). In Surya and Gautama (2020), MobileNet was applied to predict diseases from Cassava leaves Mwebaze et al. (2019). This public dataset has 1 healthy and 5 disease classes and was split into a training set (5656 images), a validation set (1889 images) and a test set (1885 images). All images are resized to 224 × 224 pixels. The proposed MobileNet model gained 85.38% accuracy. In Agarwal et al. (2020) MobileNet was shown to achieve 63.75% on tomato leaf images. In Chowdhury et al. (2021), the authors employed three sub-models (B0, B4, B7) of EfficientNet to classify tomato leaf diseases (Plant Village’s 10 tomato categories). There are three types of this study’s classification tasks, binary classification (healthy or unhealthy), six-class classification (1 healthy and 9 diseased categories are categorized into 5 classes, i.e., bacterial, fungal, viral, mold, and mite disease) and ten-class classification (1 healthy and 9 diseased). All images were resized to 224 \(\times\) 224 and data augmentation was applied. The evaluation was carried out with 5-fold cross-validation. The results showed that for binary classification and six-class classification, EfficientNet-B7 had the best performance with an accuracy of 99.95% and 99.12%, respectively. For the ten-class classification, EfficientNet-B4 performed better than other models with an accuracy of 99.89%.

5.2.2.8 Custom CNN

Although off-the-shelf CNN models were shown to be useful for leaf disease classification, they were originally designed and tested for general image classification tasks using benchmarking datasets, much different from leaf images. Therefore, they may not be optimal for this specific task and custom CNN models can be best for each dataset. Many researchers customised and developed their own CNN models, either from scratch or modify from existing ones. In Agarwal et al. (2020), a new CNN model was developed to classify tomato leaf diseases (extracted from Plant Village). They compared the proposed CNN model with Mobilenet, VGG-16 and InceptionV3. The proposed model’s accuracy is 91.2%, better than the others, and its storage space is the smallest (1696 KB). Sardogan et al. (2018) also studied tomato leaves from Plant Village. They used the CNN model with Learning Vector Quantization (LVQ) algorithm to classify the diseases. The model achieved 86% average accuracy. Another variant of CNNs was proposed in Bhowmik et al. (2020) to classify two tea leaf diseases. The precision of this model was approximately 95.93%. In Sunil et al. (2020), the authors designed a new Multi Convolutional Layered-based CNN model and apply it to three sub-datasets (Peach, Pepper, and Strawberry) from Plant Village. They showed that their CNN can effectively classify the leaves of three sub-datasets with accuracy from 87.47% to 99.25%. The CNN model in Anandhakrishnan and Jaisakthi (2020) was developed based on Xception V4 architecture and was tested to compare with several common pre-trained models, including VGG-16, ResNet-50, AlexNet and LeNet. The dataset used in this study is 10 classes of tomato leaves from Plant Village, where 14528 images were split into 80% for training and 20% for testing. The experiment results (in accuracy score) are: the proposed model (99.45%), AlexNet (90.1%), Lenet (88.3%), Resnet (98.40%) and VGG-16 (90.1%). Huang et al. (2020) tested Vanilla CNN and three pre-trained models (VGG-16, MobileNet & AlexNet). Finally, they built an ensemble model (average voting method) which achieve perfect accuracy of 100%.

A stacking approach was developed in Guan (2021), aiming to create an effective way to improve classification accuracy. The dataset in this work is from AI-Challenger 2018 (which was modified from Plant Village), it contains 10 different plant species and 61 classes. They split the dataset into a training set (31718 images) and a test set (4540 images). After data augmentation, the training set has been trained by four models (Inception Network, ResNet, Inception Combine ResNet and DenseNet), and being stacked. The stacking method achieved 87% accuracy, better than ResNet (82.78%), Inception Net (82.22%), DenseNet (83.44%) and Inception-ResNet (84.07%).

Another idea is to employ a hybrid approach, between deep learning and shallow learning, where deep learning would play a role of a feature extractor Kawatra et al. (2020). In this work, AlexNet was combined with Linear SVM to classify diseases in the Plant Village dataset (resized to 227 \(\times\) 227 pixels). The experimental results showed that their proposed model gained 99.98% accuracy better than the basic AlexNet (96.34%) and AlexNet with Global Average Pooling Layer (97.29%). In addition, they evaluated different optimizers (AdaMax, AdaDelta, Adam, RMS Prop, SGD, AdaGrad) and showed that AdaMax has the best performance in this study.

5.2.3 Object detection and classification CNNs

In real-life scenarios, it would be useful if a system can detect leaves from cluttered backgrounds and classify their diseases. In this case, image segmentation can be applied as a first stage to extract the leaves area before applying CNNs for image classification as we discussed in the previous section. However, it would be more convenient to have an end-to-end approach where CNNs can detect leaves and identify diseases. In Xie et al. (2020), the authors employed a Faster Region-based CNN (R-CNN) model to detect and classify grape leaf disease with the best accuracy of 81.1%. Faster R-CNN was also the interesting model in Singh et al. (2020a) for an evaluation of the PlantDoc dataset. They claimed that fine-tuning Faster R-CNN with InceptionResnetV2 and MobileNet can reduce the classification error significantly. Hu et al. (2021) proposed a model based on Faster R-CNN to detect tea leaf blight (TLB) and used VGG-16 to do the severity analysis. The dataset of disease classification has 398 images. Among them, 80 made up the test set. The dataset of severity analysis contains 270 mildly diseased leaf images (after augmentation, it increased to 700) in the training set and 100 in the test set, 700 Severe diseased leaf images in the training set and 100 in the test set. The proposed model gained 91.22% accuracy. lakshmi and Nickolas (2020) studied another variant of R-CNN, namely Mask R-CNN. They improved Masked-RCNN with ResNet50 and Feature Pyramid Network as key components, to classify Betelvine leaf diseases. For evaluation, a private dataset was collected from real cultivated Betelvine crops containing two diseases which are Anthracnose (358 images) and Phytophthora (456 images), and 1 healthy category (200 images). All images are resized to 256 × 256 pixels. The proposed Mask-RCNN model achieved 84.07% F1-score, which is better than Faster-RCNN (74.32%) and the original Mask-RCNN (83.11%).

5.2.4 Comparison between DL and SL

Table 6 Comparison between ML & DL

Early applications of deep learning attempted to integrate deep models with feature extraction. For example, in Ramya et al. (2021) and Ashok et al. (2020), the authors employed hand-crafted features for image segmentation before training CNNs to classify the tomato leaf diseases. In particular, Ramya et al. (2021) employed k-means clustering for feature extraction, coupled with CNNs to estimate disease severity, although their results are not clearly detailed. In Ashok et al. (2020) Discrete Wavelet Transform (DWT) and grey-level co-occurrence matrix (GLCM) features were used to segment leaves from the background which helped a CNN model to achieve 98.12% accuracy, better than AlexNet (95.75%) and traditional (shallow) neural networks (92.94%).

Comparisons between SL and DL methods have been carried out largely in recent years. When applying them on the same datasets the performance of DL methods tends to be superior. Deep learning approaches, such as CNNs, are very effective in image classification where abundant data is available as CNNs can extract discriminative features from images automatically. Therefore, the descriptiveness of feature extractors used in shallow learning can be a bottleneck for classifying leave diseases from images. We show the details of the current comparison in Table 6 (if there is a comparison, the highest accuracy is in bold). Sujatha et al. (2021) compared the performance of SVM, RF, Stochastic Gradient Descent (SGD), Inception-V3, VGG-16 and VGG-19 on the citrus leaf disease dataset. Using 10-fold cross-validation, 3 deep learning methods were shown better than the shallow counterpart. A study in Sharma et al. (2020) compared logistic regression(LR), KNN, and SVM with CNN on the Plant village dataset. The shallow learning methods in this work used K-means clustering as the feature extractor. The experimental results demonstrated that CNN got an overwhelming victory (98% accuracy) compared to other ML methods (around 60%). A deeper study has been shown in Saraswathi et al. (2021) where the authors analysed the weaknesses of several shallow learning methods, including K-Means, (shallow) artificial neural networks (ANN), Naïve Bayes, SVM, and KNN. For the empirical results, K-Means and ANN have quite low accuracy, and Naïve Bayes has a slow convergence rate. Meanwhile, SVM achieves relatively poor performance and KNN has some dimensionality issues. The such analysis led to an investigation into a system based on CNNs to improve the performance. As expected, the proposed CNN achieved the best accuracy (96%). Agarwal et al. (2019) used general data augmentation methods i.e. zooming, inversion, flipping, rotation, to make the training free from bias for any particular class (a.k.a balancing data). In this work, the CNN model also achieved the best accuracy of 99%. This is better than other pre-trained models they tested (AlexNet: 86.5%, VGG-16: 97.5%), and also other shallow learning approaches (Decision Tree, Naive Bayes, SVM, LDA, KNN, LR and RF). Among the shallow learning models, RF with HSV-histogram feature achieved the best result (97.5%). The proposed CNN model in Geetharamani and Pandian (2019) can classify leave diseases with 97.87% accuracy, better than the popular transfer learning approaches (AlexNet, VGG-16, Inception-v3 and ResNet) and shallow learning approaches (SVM, logistic regression, decision tree and K-NN). In another work Kumar et al. (2020a), the authors employed Residual Networks (ResNet34) to construct a custom model with 99.40% accuracy and 96.51% F1-score. This results significantly surpass shallow learning models: SVM (50.69% & 50.57%), Decision Tree (72.23% & 72.02%), Logistic Regression (80.99% & 80.88%) and K-NN (87.86% & 88.06%). Vijaykanth Reddy and Sashi Rekha (2021) used their proposed approach (integrating CNN with AlexNet and GoogLeNet cascade inception) to classify apple leaf diseases. Their proposed model gained 97.62% better than shallow learning, including SVM (68.73%) and Back Propagation (54.63%).

From multiple studies on the comparison between shallow learning and deep learning, some researchers concluded that compared with the shallow learning approaches the deep learning approaches, based on CNN architecture, can be more suitable and effective for leaf disease classification Sharma et al. (2020). As we can see, CNNs do not require manual pre-processing or feature extraction which may cause side effects Sharma et al. (2021); Bir et al. (2020), although it can shorten the training time and fewer computations for shallow learning. Table 6 clearly shows that CNNs outperform shallow learning by a significant margin. However, if the data is small, shallow learning can be more useful Zhang et al. (2018). In order to make deep learning effective, the quantity of data should be sufficient. In the next section, we will show how augmentation has been emerging as a great tool to deal with the data availability problem.

Take-home messages

\(\bullet\) Deep learning models are useful for leaf disease classification and should be recommended in real-life applications due to their high accuracy. The common off-the-shelf deep learning models are CNN, AlexNet, VGG-16, ResNet, EfficientNet, Inception and MobileNet.

\(\bullet\) Custom CNNs are highly encouraged as we should design an optimal model for different tasks. It was evident that custom CNNs perform better than off-the-shelf models.

\(\bullet\) Deep learning is more effective than shallow learning in leaf disease classification. It is also more convenient as we can get rid of the feature extraction steps and minimise the manual effort for data processing.

\(\bullet\) Compare with Table 4, we cansee that the datasets used in deep learning papers were relatively larger than in other studies. This is consistent with the fact that deep learning models are usually data-hungry.

\(\bullet\) Most of the studies focus on the performance (accuracy) aspect of the task while a more comprehensive comparison with compactness and efficiency is still missing. There are a few papers that addressed these issues, for example, Sharma et al. (2020) evaluatesmodels’ speed and Agarwal et al. (2020) evaluates models’ storage space.

\(\bullet\) Different studies use different experiment settings, including different partitions for training/validation/test which makes their results difficult to compare. Therefore, a benchmarking study is needed.

5.3 Augmented learning

5.3.1 Data augmentation

As mentioned previously, the main obstacle to this research is the availability of datasets Chouhan et al. (2020); Kumar et al. (2022); Li et al. (2021); Agarwal et al. (2021). More often situations researchers need to deal with the problem of not having enough data (i.e., small datasets) first, therefore, data augmentation is an effective method to solve this problem. Data augmentation can be seen as the imagination or dreaming of humans where we can simulate different scenarios based on our experience to anticipate unobserved events Shorten and Khoshgoftaar (2019).

Many research results have already confirmed the effectiveness of data augmentation in leaf disease classification Geetharamani and Pandian (2019); Mureşan et al. (2020); Naik et al. (2022); Shaji and Hemalatha (2022); Lamba et al. (2022); Nagaraju et al. (2022). Table 7 shows the common data augmentation methods in leaf disease classification. Data augmentation has several purposes, as follows: (i) enrich a dataset by increasing its volume Guan (2021); Moyazzoma et al. (2021); (ii) mitigate the data imbalance problem Li et al. (2021); Hu et al. (2021); (iii) improve the generality to reduce the over-fitting issue and make machine learning models more robust Guan (2021); Moyazzoma et al. (2021); Mureşan et al. (2020); Nagaraju et al. (2022). Generally, in leaf disease classification the common data augmentation approaches (including physical expansion Li et al. (2021) and position and colour augmentation Naik et al. (2022)), are widely used thanks to their convenience and simplicity. There are many existing functions and tools available for position augmentation, such as Pytorch’s transforms function (torchvision.transforms) Kaushik et al. (2020) and the Augmentor python library Agarwal et al. (2020). Position augmentation methods mean changing the image’s position, shape, size and so on. Rotating (rotation) is the most used method, as can be seen in Geetharamani and Pandian (2019); Agarwal et al. (2019); Huang et al. (2020); Zhang et al. (2018); Moyazzoma et al. (2021); Paymode et al. (2021); Chowdhury et al. (2021); Saraswathi et al. (2021); Kaushik et al. (2020); Agarwal et al. (2020); Jepkoech et al. (2021); Bir et al. (2020); Hu et al. (2021); Naik et al. (2022); Shaji and Hemalatha (2022); Lamba et al. (2022); Bhujel et al. (2022). Here, the method rotates leaf images to different angles (e.g. 30\(^\circ\), 90\(^\circ\) or 180\(^\circ\) ) to produce new samples. After rotating, we can apply other techniques to generate more samples, such as flipping Geetharamani and Pandian (2019); Agarwal et al. (2019); Huang et al. (2020); Moyazzoma et al. (2021); Paymode et al. (2021); Saraswathi et al. (2021); Kaushik et al. (2020); Agarwal et al. (2020); Jepkoech et al. (2021); Bir et al. (2020); Naik et al. (2022); Shaji and Hemalatha (2022); Lamba et al. (2022); Bhujel et al. (2022), zooming/scaling Agarwal et al. (2019); Huang et al. (2020); Bir et al. (2020); Lamba et al. (2022); Bhujel et al. (2022); Geetharamani and Pandian (2019); Zhang et al. (2018); Chowdhury et al. (2021); Kaushik et al. (2020); Naik et al. (2022), cropping Paymode et al. (2021); Saraswathi et al. (2021); Agarwal et al. (2020); Naik et al. (2022), vertical or horizontal shearing Bhujel et al. (2022); Huang et al. (2020); Saraswathi et al. (2021), shifting Huang et al. (2020); Saraswathi et al. (2021); Bir et al. (2020); Shaji and Hemalatha (2022); Bhujel et al. (2022), transformation Paymode et al. (2021); Kaushik et al. (2020); Hu et al. (2021); Naik et al. (2022), translation Moyazzoma et al. (2021); Chowdhury et al. (2021); Kaushik et al. (2020); Naik et al. (2022); and resizing Kaushik et al. (2020); Agarwal et al. (2020).

Besides texture augmentation, researchers also used colour augmentation to process the leaf images, such as Brightness, contrast, saturation, hue Naik et al. (2022); Paymode et al. (2021), and Principal Component Analysis (PCA) colour augmentation Geetharamani and Pandian (2019). It is worth noting that there may be pitfalls to the use of colour augmentation techniques for leaf images as colour is important to identify diseases. Therefore, we should be careful not to destroy or alter the original features of the leaf images. For example, some researchers used colour augmentation methods to change colourful leaf images Zhang et al. (2018); Naik et al. (2022); Geetharamani and Pandian (2019), but in Li et al. (2021) the authors pointed out that colour may be one of the most important manifestations of some leaf diseases, so changing the colour features of original images may bring negative effects.

The augmentation methods mentioned above may have limitations such as poor quality, inadequate diversity, and unevenness Li et al. (2021). Recent approaches, including Generative Adversarial Networks (GAN) Goodfellow et al. (2014), employ deep learning to generate artificial data. GAN techniques employ a neural networks called generator to produce images which are different from a training set to fool a classifier (a discriminator) as if they belong to some classes of the set. In the case of leaf images, GAN can generate new images for different disease types. Compared to the non-learning methods, GAN-based Data Augmentation is based on generative modelling and learning where the focus is on creating artificial samples and retaining similar characteristics from the original dataset. GAN has been widely used to create more samples recently Li et al. (2021). In Lamba et al. (2022), the original dataset comprises a total of 3941 images, including 1858 images of bacterial blight and 1706 images of leaf blast. After applying GAN augmentation, the dataset size increased to 9101 images, with 3767 images representing bacterial blight and 5034 images representing leaf blast, and the experimental results showed that the accuracy of CNN models can be improved with data generated from GAN.

Besides the texture/colour-based transformation and GAN approaches, there are some new methods were developed. For example, Nagaraju et al. (2022) proposed two image augmentation (IA) methods, including image pre-processing & transformation algorithm (IPTA) and image masking & REC-based hybrid segmentation algorithm (IMHSA). The methods aim to produce a sufficient quantity of training leaf disease images to improve the richness of small datasets. IPTA is an adaptive supervised learning approach to transform the original images into augmented images. IMHSA is an unsupervised approach for RGB image segmentation. The empirical study showed that with augmented data the validation accuracy was raised from 65% to 73%.

Table 7 Common data augmentation technologies in leaf disease classification

5.3.2 Model augmentation (transfer learning)

Table 8 Transfer learning

Transfer Learning (TL) is a technique in machine learning that allows models trained on one task to be adapted to perform another task. It also is a method to augment a learning model by reusing the knowledge learned from other domains for different (but related tasks). This could be useful in leaf disease classification, as models trained on one type of plant could potentially be adapted to work on other plants. There are many related works in this direction, including domain adaptation and multi-task learning, however, in most practice, we can employ pre-trained models which are firstly trained from a huge, public dataset (e.g., ImageNet dataset) for other tasks, then deploy them on the target leaf disease dataset (e.g., Plant Village). In Nagaraju et al. (2020), the authors showed that through transfer learning the training time of CNN models can be shortened significantly. This idea has been deployed and studied widely in leaf disease classification. Table 8 lists the recent work about transfer learning methods in leaf disease classification (if there is a comparison, the highest accuracy is in bold).

A study in Bir et al. (2020) adopted several pre-trained deep learning models, including MobileNetV2, EfficientNetB0 and VGG-19, to classify tomato leaf diseases (1 healthy and 9 diseased classes). From the experimental results (MobileNetV2: 97.26% accuracy, EfficientNet-B0: 98.6% accuracy, VGG-19: 96.86% accuracy), they claimed that transfer learning has several advantages: smaller size models, less computational costs, and suitable on the mobile devices. In Nagaraju et al. (2020), the authors utilised a pre-trained VGG-16 and fine-tune their collected grape and apple leaves dataset. The model achieved 97.87% accuracy, showing that through transfer learning CNN models’ performance and efficiency can be improved. Another work in Lauguico et al. (2020) pointed out that one leaf may contain multiple leaf diseases in real life, thus, the authors used montage images to create the leaves which contain multiple diseases by combining nine pictures into one. Three pre-trained networks AlexNet, GoogLeNet & ResNet-18 are tested, which achieved 95.65%, 92.29% and 89.49% accuracy respectively. In Moyazzoma et al. (2021), a pre-trained MobileNetV2 is used to classify 21 classes of healthy and diseased leaves (7800 images, resized to 224 × 224 pixels). Each class has 200 training samples, 100 validation samples and 50 test samples. The transferred MobileNet can predict diseases with 90.38% accuracy. Krishnamoorthy and Parameswari (2021) transferred pre-trained VGG-16, ResNet50 and InceptionV3 to classify rice leaf diseases. The dataset contains 3 leaf diseases and 1 healthy categories (resized to 224 × 224 pixels). Each class of the training set has 1000 images and each class of the test set has 300 images. Finally, the fine-tuned VGG-16, ResNet50 and InceptionV3 (with different hyper-parameters) achieved 87.08%, 93.41% and 95.41% accuracy, respectively. Kibriya et al. (2021) deployed pre-trained GoogLeNet and VGG-16 for tomato leaf disease classification with accuracy of 99.23% (GoogLeNet) and 98.00% (VGG-16). A similar study can be seen in Meeradevi et al. (2020) where the authors transferred a pre-trained VGG-16 to classify tomato leaf diseases. They tested several types of VGG-16, including (i) a fresh VGG-16 (training from scratch); (ii) a classic transfer learning VGG-16 pre-trained on ImageNet; (iii) a pre-trained VGG-16 with incorporated dropout and L2 regularization; and (iv) a pre-trained VGG-16 with dropout and an attention module. In the results, they claimed that the (iv) version with dropout operation and an attention module can effectively improve the accuracy and reduce validation loss, better than other versions. The proposed model in Sharma et al. (2021) is based on pre-trained ResNet50. Only its last layer was fine-tuned and a Global average pooling layer was added with two 512-neuron dense layers on top. The result of this model, 98% F1-score, shows the advantage of transfer learning. Kaushik et al. (2020) presented a pre-trained ResNet-50 with a data augmentation method to detect and classify 6 categories of tomato leaf diseases (Plant Village). The dataset was increased by four times through data augmentation. They showed that their proposed ResNet-50 model’s accuracy achieved 97% after fine-tuning the transferred model. In Hassan et al. (2021) the authors transferred common pre-trained models InceptionV3, InceptionResnetV2, MobileNetV2, and EfficientNetB0 with depthwise separable CNN method to classify diseases in entire images of Plant Village dataset. The input size was set as 224 × 224 pixels. And they split the dataset into three test set types which are 20%, 30% and 40%. Compare with other models, EfficientNetB0 gained the best accuracy of 99.56% on the test set. They observed that different split types have little impact on this study. Using a smaller subset (5 types of crops from Plant Village) Sai (2021) tested fine-tuning MobileNet and InceptionV3 models. In this work, the leaf images were all processed by the segmentation method, and the two models achieved 99.62% accuracy and 99.74% accuracy, respectively.

Take-home messages

1. Both data and model augmentation can help improve the performance and robustness of machine learning approaches for leaf disease classification. More attention can be seen in transfer learning where pre-trained models can be reused and augment the learning on leaf images.

2. Although data augmentation can be useful some researchers are skeptical about its effect. This is because some data augmentation methods (e.g., random cropping, colour transformation) can change the semantics of original images, which may create misleading images and reduce the performance of classification models Wang et al. (2019).

3. More attention is being paid to transfer learning, as can be seen in table 8 are satisfactory. This is reasonable as there are abundant pre-trained models on image data available for public use.

4. There can be promising ideas for combining data augmentation and model augmentation. However, this study has not been addressed properly. We would encourage more studies in this direction.

6 Applications

In this section, we will review different leaf disease classification applications, from prototyping/lab-based products to commercialised software. We categorise the applications into: Web-based apps, Mobile apps, and Devices & Hardware.

Table 9 Various applications
Fig. 5
figure 5

Various applications of ML technologies

6.1 Web-based apps

Website-based applications are always the first choice of industry or researchers because it is easy to use and not limited to hardware configuration. The user could submit a picture from a computer or a mobile phone, which was captured by a camera, to get predicted results in real time.

Several examples of web-based apps are shown in Table 9. For example, Plant Disease Identifier (https://cropify.herokuapp.com/) is a website to provide tomato and potato leaf disease classification. A user only needs to choose a picture of the leaf to submit then will get the predicted result shortly. A rice disease classification system can be deployed on a website and WhatsApp (see Fig. 5b). This system can diagnose three diseases of rice (based on a CNN model), and identify the severity of the diseased area (percentage, based on image segmentation). The dataset used here is the HCI Rice Leaf Diseases Dataset which contains 136 images of three rice diseases. The accuracy of this system is 85.7% Wadhawan et al. (2020).

6.2 Mobile apps

In recent years, mobile apps became more popular. Mobile apps can bring better user interface and user experience with the development and popularity of smartphones.

There are some examples of mobile apps for leaf disease classification from the industry. CropsAI is an iOS mobile app which can predict the common leaf diseases of 5 species (Corn, Wheat, Tomato, Soybeans & Rice). Plants Disease Identification is a popular iOS mobile app with a price of $2.99 on the App Store. Agrio is another mobile app which supports both Android and iOS. It claimed to have an AI-based alert system (needs remote sensors) that will notify the subscribed users and provide written preventative measures when detecting or expecting diseases or pests. Plantix is an Android mobile app which can classify leaf diseases of 30 main crops. It could provide instant disease classification and treatment advice. Notably, Plantix can have the largest online farmers and agricultural specialists community in the world Siddiqua et al. (2022). Users of Plantix could gain and share knowledge and help each other. Leaf Doctor was a mobile app created by the University of Hawaii, only available on the iOS system. Leaf Doctor supports leaf disease classification and provides disease severity estimation (see Fig. 5i–k). The limitation of the mobile app is the software may be limited to smartphone systems and configuration. If a smartphone has a low configuration or outdated system, it will not work properly or will run the software slowly.

From the research community, both Nalawade et al. (2020) and Paymode et al. (2021) designed a mobile app for leaf disease classification. The app in Paymode et al. (2021) can classify tomato leaf diseases. Its training dataset was from tomato leaves of Plant Village and the prediction model was based on CNN. They showed that their app could achieve 97% accuracy. Differently, the mobile app in Nalawade et al. (2020) can provide disease classification and real-time field factors monitoring (e.g., temperature, humidity, moisture) (see Fig. 5a). It was based on a CNN model which was trained on part of the Plant Village dataset. The authors demonstrated that their app can achieve 87.43% accuracy on leaf disease prediction.

6.3 Devices and hardware

Devices or custom hardware are always required by professional agricultural specialists or researchers because the specific hardware can support more computing power and more reliable performance. In Ponnusamy et al. (2020) a study pointed out that existing deep learning approaches would need high processing power and may not be suitable for low-budget mobile devices. However, the high configuration will require more capital investment and professional technical capability requirements and training. We show some examples from research as follows. In Nooraiyeen (2020) a robotic vehicle was designed and developed (see Fig. 5c) to detect Basil/Tulsi leaf diseases. Its components include a microcontroller, Bluetooth module, camera module and remote computer system. In the image detection module of the system, they used K-Means Clustering and SVM Classifier through MATLAB software. Users could get the prediction result from the software interface (see Fig. 5d). In Chouhan et al. (2021) a novel framework (named IoT_FBFN) was proposed. This framework is based on Fuzzy Based Function Network (FBFN) with IoT technology. It can capture real-time leaf images through the Raspberry Pi camera and transmit them to the system through the internet for FBFN network to classify diseases. They trained the system using a dataset of about 470 trees planted alongside the road in India. They demonstrated that rhe proposed system can achieve 80.66% average specificity and 80.18% average sensitivity, better than K-means and SVM. A handheld device (Embedded Platform) system was developed in Gajjar et al. (2021) (see Fig. 5f. With this handheld device, the classification accuracy rate can reach 96.88%. The device will first detect leaves using a camera then divide the image and localise the leaves through data annotation and MobileNet. This module was trained on 338 leaf images they collected, 52 images online and 111 images from Plant Village. Finally, a custom CNN was used to classify diseases. This CNN was trained on 20 categories of Plant Village (apple, corn, potato & tomato). The system has a certain robustness capability against various conditions (e.g., weather, illumination & background). An interesting device, named Smart Glass, was developed in Ponnusamy et al. (2020). This wearable device can be more convenient than the hand-held devices mentioned previously. It was based on a Raspberry Pi Zero W and can identify whether the leaf is healthy or not in real-time (see Fig. 5e). The classification module used in Smart Glass is a transfer learning approach with YOLOv3 + CNN architecture fine-tuned on 304 tomato leaf images from farms (split into two categories: healthy and unhealthy). The proposed model can achieve an average accuracy of 82.38%.

Besides hand-held and wearable devices, Unmanned Arial Vehicles (UAVs) are attracting more attention Ahmed and Reddy (2021); Albattah et al. (2022). UAVs have great potential in agriculture in the future. In Albattah et al. (2022), a team designed a drone (quadcopter DJI Phantom 3) with pre-trained EfficientNetV2-B4 to detect leaf diseases. The classification module was trained on Plant Village and achieved near-perfect accuracy of 99.99%. In the industry, American company Agremo started using drones to detect leaf diseases and weeds in sugar beet farms. Drones are especially suitable for continuous inspection and work on large-scale farms. They alleged their drones can provide plant counts, location data of certain weeds and diseases, or irrigation problem identification (water stress). The data of drones collected could produce data visualization easily for farmers analysing leaf diseases, weeds, water issues and so on.

Take-home messages

1. A wide range of apps and devices have been built using machine learning techniques (mostly deep learning).

2. Mobile apps are becoming more popular than web apps for individual users thanks to their compactness and mobility. Meanwhile, UAVs (drones) have potential in large-scale farming. Someprototypes of hand-held and wearable devices were tested but they are not ready for commercialisation.

7 Conclusions

Despite machine learning techniques have been widely used in leaf disease classification, to our best knowledge, a comprehensive and up-to-date survey which can cover related available data, techniques and applications is still desired by the industry and research community. Therefore, in this paper, we surveyed about 100 recent related articles, collected and listed a series of public datasets which can be researched, analysed state-of-the-art machine learning approaches (i.e., shallow learning, deep learning & augmented learning) and reviewed feasible applications in academia and industry. We have the following findings. In the data part, Maize Leaf (NLB) dataset could be the largest public dataset of single plant species at present while Plant Village is the most popular dataset. Plant Village, Plant Leaves and Plantae_K are all laboratory datasets which can be useful for prototyping and evaluating machine learning models. However, real-field datasets, including PlantDoc would provide a more comprehensive evaluation and support for realistic applications. For technologies, shallow machine learning requires feature extraction from images Applalanaidu and Kumaravelan (2021) to be useful for the disease classification task. The two most common methods are K-means clustering and grey-level co-occurrence matrix (GLCM), in which GLCM is more recommended. A combination of features is also encouraged, as it can help improve performance. Support vector machine (SVM) was the most common method for leaf disease classification in shallow machine learning. It is very suitable for both smaller (more likely to be linear) or non-linear datasets Thet et al. (2020). Its better performance in comparison to other classifiers is evident in several studies. However, if suitable features are selected, KNN or RF also can achieve better accuracy. Relative to shallow learning, Deep learning models have been proven useful and more effective than shallow learning for leaf disease classification which should be recommended in real-life applications due to their high accuracy. It is also more convenient as we can get rid of the feature extraction steps and minimise the manual effort for data processing. The common off-the-shelf deep learning models are CNN, AlexNet, VGG-16, ResNet, EfficientNet, Inception and MobileNet. Custom CNNs are highly encouraged as we should design an optimal model for different tasks. It was evident that custom CNNs perform better than off-the-shelf models. We can see that the datasets used in deep learning papers were relatively larger than in other studies. This is consistent with the fact that deep learning models are usually data-hungry. Most of the studies focus on the performance (accuracy) aspect of the task while a more comprehensive comparison with compactness and efficiency is still missing. There are a few papers that addressed these issues, for example, Sharma et al. (2020) evaluates models’ speed and Agarwal et al. (2020) evaluates models’ storage space. Recent research proved that both data and model augmentation methods can help improve the performance and robustness of deep learning for leaf disease classification. More attention is on transfer learning where pre-trained models can be reused and augment the learning on leaf images. Although data augmentation can be useful some researchers are sceptical about its effect. The reason may be some data augmentation methods (e.g., random cropping, colour transformation) can change the semantics of original images, which may create misleading images and reduce the performance of classification models Wang et al. (2019). The popularity of transfer learning is reasonable as there are abundant pre-trained models on image data (e.g., ImageNet) available for public use now. For applications, section 6 showed that a wide range of applications (software) and devices (hardware) have been built using machine learning techniques (mostly deep learning). Mobile applications are becoming more popular than web apps for individual users thanks to their compactness and mobility. Meanwhile, UAVs (drones) have advantages and potential in large-scale farming. Some prototypes of hand-held and wearable devices were tested but they may not be ready for commercialisation. Last but certainly not least is the explainability of Machine Learning methods. With the increasing adoption of Machine Learning in the agriculture industry, there arises a pressing demand for models to be transparent and explainable. This may be important for enabling farmers to understand the decision-making process and trust this new technology method.

Based on the above findings, we have the following suggestions.

  1. 1.

    The available datasets listed are useful for domain-adaptation and multi-task learning, however, this is largely missing in the current literature.

  2. 2.

    A machine learning model should learn from different datasets in a compositional manner where the model can effectively adapt to new tasks/datasets added in.

  3. 3.

    For small datasets with a small set of disease classes, simple methods may achieve good results.

  4. 4.

    Many studies use different experiment settings, including different partitions for training/validation/test which makes their results difficult to compare. Therefore, a benchmarking study is needed and encouraged.

  5. 5.

    The research on explainability in this area remains worth attention, as the industry still requires a means to effectively explain decision-making by Machine Learning models to enable user understanding.

  6. 6.

    There can be a promising idea of combining data augmentation and model augmentation. However, this study has not been addressed properly.