Machine learning for leaf disease classification: data, techniques and applications

The growing demand for sustainable development brings a series of information technologies to help agriculture production. Especially, the emergence of machine learning applications, a branch of artificial intelligence, has shown multiple breakthroughs which can enhance and revolutionize plant pathology approaches. In recent years, machine learning has been adopted for leaf disease classification in both academic research and industrial applications. Therefore, it is enormously beneficial for researchers, engineers, managers, and entrepreneurs to have a comprehensive view about the recent development of machine learning technologies and applications for leaf disease detection. This study will provide a survey in different aspects of the topic including data, techniques, and applications. The paper will start with publicly available datasets. After that, we summarize common machine learning techniques, including traditional (shallow) learning, deep learning, and augmented learning. Finally, we discuss related applications. This paper would provide useful resources for future study and application of machine learning for smart agriculture in general and leaf disease classification in particular.


Introduction
In recent years, Machine Learning (ML) has been emerging as a game changer in multiple aspects of life.In agriculture, machine learning has been widely used as an effective means of production, including but not limited to automatic harvesting machines, production estimation, pest control, weeds control, irrigation control, plant pathology (leaf disease classification), and fruit classification.Generally, diseases of a plant can react in different parts, such as its leaves, flowers and roots.Among them, plants' leaf is one of the most dominant and pronounced parts.Because leaves can participate in providing the nutrients the plant needs to grow, which is the photosynthesis in leaves produces the chlorophyll from sunlight [1].Some disease of leaves may cause their drop or wither, directly affecting the plant's yield and even survival.Furthermore, it will bring negative impacts, leading to crop productivity decrease, and production costs rise.In the past, farms generally rely on labour and experts for routine inspections and disease management.Their disadvantages are obvious.First, lots of manpower and costs are required.Second, labours need training and easily get fatigued on manual jobs.Third, it is difficult to detect leaf disease timely and on a large scale.Forth, diagnosis may be subjective due to human errors and bias.Thus, an effective leaf disease classification approach is the most basic need for plant cultivation.Fortunately, ML approaches have been recently emerging as a better solution compared to traditional methods, showing their effectiveness and ease of use in plant leaf pathology classification through plant leaf image analysing.Plant leaf images have several advantages.Datasets of leaves are relatively easy to collect, analyse and reproduce (e.g., using a camera).We can also extract useful features (e.g., species, healthy states, age, and disease categories), which would improve the quality and quantity of agricultural production.Therefore, efficient and timely identification and classification of plant diseases will be the key to remedying the loss of production.Nowadays, with the introduction of precision agriculture (PA) or smart agriculture (SA) [1][2][3][4][5][6], ML technologies were researched and employed, especially in plant leaf pathology classification.Combine with Big Data and Internet of Things (IoT), ML can automatically detect plant leaf diseases as early as possible.Currently, the applications of ML have been deployed in various hardware and software, e.g., mobile phone applications [7], websites [8] and smart glasses [9].With the increasing demand of ML in smart agriculture, a comprehensive survey on leaf disease classification will be beneficial to interested researchers and farmers.This paper would provide the research and industry communities with useful information on the available data and techniques, their advantages and weakness, and their applicability.
In recent years, there has been a growing interest in utilising machine learning for leaf disease classification.Several surveys have been conducted on this research topic; however, we have identified certain limitations within the reviewed works.The scope of the reviewed papers was often narrow, failing to encompass the broader concept of machine learning in leaf disease classification.Additionally, many of the reviewed papers were outdated, indicating a need for more up-to-date research in this area.Furthermore, a comprehensive review of available datasets for leaf disease classification is still lacking.It is also necessary to conduct a thorough review of the various machine learning approaches that have been employed.Currently, recent surveys have predominantly focused on emerging deep learning techniques, such as Convolutional Neural Networks (CNN).However, due to the diverse techniques and datasets used in each survey, it remains challenging to analyze and compare research outcomes.Moreover, while numerous software applications of machine learning for pathology, including leaf-disease analysis, have been developed recently, there is a lack of comprehensive review in this specific domain.
This paper will provide a comprehensive view of current achievements and trends in the application of ML for leaf disease classification.Currently, leaf disease classification approaches can be categorised into traditional (shallow) ML, Deep Learning (DL) and Augmented Learning (AL).DL is a branch of ML and AL is a research topic, aiming to improve the effectiveness and usefulness of ML approaches.In shallow learning, feature extraction plays an important role which, in many cases, requires experts' involvement, i.e. to engineer useful features.Deep learning, on the other hand, may reduce the cost of feature engineering as it can facilitate effective learning over a large amount of data.Although, data-hungry sometimes is an issue in deep learning, leaf images are sometimes easy to collect and farmers can help with disease annotation.However, to reduce the reliance on the labelled data, data augmentation methods have been taken to produce more training data and enhance the model robustness.Transfer learning is also a promising approach for this task, as it can reduce the need for leaf data by utilising pre-trained models from other tasks.As we can see, the keys to the success of ML approaches are the quality and quantity of data.Therefore, different from the other previous surveys, we discuss the availability and quality of public datasets and their suitability for evaluating ML models.
The organisation of the paper is as follows.In the next section, we will explain how we collect and analyse related literature.Section 3 will discuss the gaps in existing review and survey papers.After that, Section 4 presents the available public datasets for leaf disease classification.This would help researchers to find, apply, and evaluate their ideas quickly.In Section 5, we categorise and compare machine learning approaches, by dividing them into three main groups: traditional (shallow) ML approaches, deep learning (DL), and transfer learning (TL).In Section 6, we present related applications available for leaf disease classification in real-life.Finally, Section 7 will summarise our findings and discuss the potential directions for future work on this research topic.This paper aims to provide some useful resources for the study and application of leaf disease classification with machine learning.

Table 1:
The publication years of Referenced Academic Articles."Tech" column shows the number of technical papers, while "Review" column shows the number of review papers.The total number of papers we study in each year is in "Total" column.

Related Work
As the interest in leaf disease classification with machine learning has been increasing recently, there are several surveys related to this research topic.In this section, we analyse recent review papers about leaf disease classification or classification.Table 2 shows their study and the gaps they left behind.As we can see, the previous surveys focused on different aspects of leaf disease classification, shedding light on some key areas in the research topic but a comprehensive study is still missing.First, we found that many related works have a shadow scope for their study.The number of papers for review is not adequate to cover the broad concept of ML in leaf disease and many papers used in the reviews are not up-to-date.For example, In [27,42], no more than 20 articles are selected from Google Scholar for their study.Another survey paper [86], published in 2020, analyse articles all before 2017.[5] analysed 26 academic papers about leaf disease detection and classification from 2015 to 2020.[21] surveyed more than 45 academic papers about plant disease detection and classification from 2017 to 2020.[48] has 12 papers focusing on deep learning techniques only.In [17], they review shallow ML (10 articles) and DL (20 articles, including TL). [10] surveyed about image processing with ML (3 articles), DL (5 articles) and SI (5 articles).[6] just includes 8 articles about the potato leaf disease classification results.In a recent survey [19], 179 papers have been studied, however, there are only 12 articles are from recent years (2020-2022) and not all of them are about leaf disease classification (the survey also covers plant species classification).Different from it, our paper focuses on more recent studies.
Second, we found that a comprehensive review about the available datasets of leaf disease classification is still missing.Many researchers already noticed that the primary obstacle in this research topic is the availability of datasets [1,17,36,48].For example, [56] surveyed 34 agricultural datasets, however, there is only one dataset, the Maize Leaf (NLB) [96], which is related to leaf diseases.Unfortunately, many datasets introduced in related work listed here are private [1,5,17,48].Plant Village is one of the most popular public datasets [6,10,17,19,27,42,48,48].This dataset is useful for the scientific research purpose, however, there are some pitfalls due to its laboratory-condition. In, [10,17,48], the authors expressed the importance of real-field datasets.In Strength Limitation [10] 2022 Detailed plant leaf disease introduction Small number of articles, image processing with ML (3 articles), DL (5 articles) and SI (5 articles) [6] 2022 Focus on potato leaf diseases Just 8 articles about the potato leaf diseases classification results.[17] 2022 List the factors may produce plant disease Did not list available public datasets [19] 2022 Detailed pre-processing, feature extraction & classifier analyses Did not list available public datasets [27] 2021 Concluded that multi-layer CNN performance is better than shallow ML The number of papers they reviewed is relatively small (17 papers) [42] 2021 Focused datasets, categories, DL methods and average accuracy Most of papers worked on Plant Village, The number of papers was too small (10 papers) [21] 2021 Detailed analysed 45 recent papers, pointed out the deficiency of SL compared to DL and feature extraction may be unnecessary to DL Different accuracy from different datasets, there may be a lack of benchmark datasets [36] 2021 Analysed the weaknesses of SL, affirmed the superiority of DL and introduced transfer learning.Besides, reviewed the common visualization techniques for explainability.
Because of the lack of available data most of the models had poor robustness; only suitable for special species and leaves [47] 2021 Reviewed recent developments of DL (CNN, DNN and TL) A few articles and studies in general [48] 2021 Some summaries of plant pathology A few articles about Leaf disease classification and a small number of publicly available datasets [53] 2021 Detailed analyses of image segmen- another research, a combination of public (55% based on Plant Village) and private data (25% ) is used [19].Recently, more calls on the availability of leaf disease data to bring greater benefits to both scientific and industrial communities [48].Third, there are many different machine-learning approaches, and they need to be reviewed thoroughly.Early survey studies focus on traditional (shallow) approaches such as Artificial Neural Networks (ANN), Support Vector Machine (SVM), AdaBoost, KNN, Decision Tree, Naïve Bayes (NB) [6,19,21,27,53,86].In these approaches, data pre-processing and feature engineering are usually needed [27,36,53,86].Feature engineering is an important step to extract the features of images as inputs for ML models [21].Normally, hand-crafted features will be extracted which requires the involvement of humans, i.e. domain experts to define useful features.For feature extraction, there exists a wide range of methods, including Local Binary Patterns (LBPs) Histogram, Speeded Up Robust Features (SURF), Scale Invariant and Feature Transformation (SIFT), Gabor Energy Filtering, Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Generalized Extreme Value (GEV) Distribution and Johnson SB Distribution [19].
Recent surveys have revolved around new techniques, including deep learning, such as, CNN [6,19,21,47,48], AlexNet, GoogLeNet, and VGGNet [6,19,21], Pooling Dilated CNNs [27].Recently, traditional (shallow) approach has been replaced by deep learning methods [87], as it may cause side effects ( [33,80]) due to human errors/biases during feature engineering step.A number of experimental results showed that DL is a powerful and useful way to detect and classify leaf diseases [5,6,21,27,36,47].DL technologies are relatively user-friendly, can extract image features and classify plant diseases automatically [36].For example, the higher accuracy of DL compared to the traditional (shallow) approach was demonstrated by [17].They found that DL models, with and without pre-training, achieved average accuracies of 99.64% and 98.64% respectively, surpassing the 95.71% accuracy of the traditional approach.For improvement, recent studies enhance the performance of machine learning models, especially deep learning, with supplementary techniques, such as segmentation [10,17,19], data augmentation [36], and transfer learning [36,47,48], or combination of traditional and deep learning [21].[36] claimed that transfer learning would be the most effective method to boost the robustness of CNN classifiers.[21] employed a combination of different segmentation algorithms to extract better features of the images.
As we can see, each survey focuses on a different set of techniques and data based on various timelines.This makes it difficult to analyse and compare the research outcomes.Moreover, many software applications of ML for pathology, including leaf-disease analysis, have been developed recently and there is a lack of a review in this aspect.In this paper, we will address the limitations above by providing a comprehensive review of recent studies, public datasets, machine learning techniques, and real-life applications of machine learning in leaf disease classification.

Datasets
Data plays a critical role in modern AI, especially in the emergence of deep learning techniques recently.The quantity and quality of training data will improve the performance of large models used in deep learning [98].In research and practice, the role of image datasets for computational vision tasks is selfevident.In [1], a study showed that the foremost challenge for research is the lack of available datasets.For leaf disease classification, in recent years, many researchers have devoted themselves to the collection of plant disease data for public use.Table 3 and Figure 2 show recent available public datasets about plant leaf diseases for computer vision research.In the table, the "Year" column represents the published year of a dataset."Species" shows the number of plant species.The "Diseases" column lists the number of unique diseases.We also include a "Class" column to show the number of original classes in the dataset, as some datasets combine species and diseases as labels.We categorise the datasets into a multi-species group and a single-species group according to their species diversity.,000 apple images, and six apple leaf health categories: "healthy", "complex", "rust", "frog eye leaf spot", "powdery mildew", and "scab".Among them, "complex" means a leaf is unhealthy but we are unable to identify an exact cause (disease).This dataset would be useful for multi-class apple leaf disease classification.[44,45] for leaf disease classification.JMuBEN Datasets (JMuBEN, JMuBEN2, JMuBEN3).This is a group of datasets (JMuBEN, JMuBEN2, JMuBEN3) that were released by the same authors [25] and were all collected by a camera under plant pathologists' guide.JMuBEN and JMuBEN2 are about Arabica coffee leaves that were taken from real coffee plantations.In the dataset, the experts also scored the disease severity (from 1 to 5), however, the Kaggle did not include the scores [90].UCI Rice Leaf Diseases Dataset.UCI Rice Leaf diseases dataset aims to use for rice plant diseases detection and classification [97].It has three disease categories: Bacterial leaf blight, Brown spot, and Leaf smut, and each category has 40 images.The limitation of it is the size is too small (120 images total).This can be useful for prototyping machine learning methods for quick testing but may not be suitable for deep learning approaches which require large amounts of data.

Multi-species Datasets
A multi-species dataset is composed of a variety of plant species, each has its own (overlapping) set of diseases.The datasets in this group can be used for the classification of species and classification of diseases.PlantDoc Dataset.Compared to Plant Village Dataset, the PlantDoc dataset aims to establish a real-field images dataset.[57] concerned that the images of Plant Village (e.g., 3a) were all taken in laboratory setups and not in the real conditions of cultivation fields.This would impact the trained model's efficacy and real-life applications.Based on that, they built the Plant-Doc dataset, which can be a sufficiently large-scale non-lab dataset for leaf disease classification.The images in PlantDoc have cluttered backgrounds and are without a standard format.A comparison between Plant Village images and PlantDoc images can be seen in Figure 3. PlantDoc has similar categories to Plant Village with 2,598 leaf images from 13 plant species.In this dataset, there are 17 unique disease categories and 38 classes for the combination of species and diseases (e.g., Apple Scab Leaf).The images were annotated by experts.2. Plant Village, Plant Leaves and Plantae K are laboratory datasets which can be useful for prototyping and evaluating machine learning models.However, real-field datasets would provide a more comprehensive evaluation and support for realistic applications.3. We found that the available datasets are very useful for domain-adaptation and multi-task learning, however, this is largely missing in the current literature.We would suggest a machine learning model to learn from different datasets in a compositional manner where the model can effectively adapt to new tasks/datasets added in.

Machine Learning Approaches
Generally, there are currently three general directions for machine learning approaches for leaf disease classification (see Figure 4), including shallow learning (SL), deep learning (DL), and augmented learning (AL).In shallow learning approaches, leaf localisation always was done first, then based on the diseased leaves to classify the diseases.In addition, feature extraction is the necessary step of shallow learning to extract the features of leaves before classification.Deep learning has been emerging as a great tool for leaf disease classification recently thanks to its ability to offer an end-to-end process for learning and prediction.Deep learning does not require the feature engineering step and is able to learn an effective classifier from input images.At present, the advantages and disadvantages of shallow learning and deep learning approaches are still inconclusive.However, there is a strong agreement that SL has disadvantages in leaf image classification tasks, such as the inability to apply to large datasets, complex processing pipelines, and especially the need for feature extraction [21,36].DL, however, also has two main disadvantages: computationally expensive and data-hungry.With the development of related hardware and computing systems, the computation expensiveness of DL has been alleviated.For the data hungriness issue, recent approaches employ augmented learning techniques by generating artificial data and/or reusing pre-trained models from other domains/tasks.

Shallow Learning
Table 4 summarises the details of this study through shallow machine learning approaches.We focus on the recent and notable papers from 2019.The general stages for leaf disease identification and classifications using shallow learning include: data(image) acquisition, processing, segmentation (possibly [10,53]), feature extraction, and identification (or classification) [10,27,36,53].While data acquisition, processing, and segmentation are common in image processing generally, in this section we discuss two aspects that directly affect the quality of leaf disease classification.

Feature Engineering
Normally, data was collected from digital cameras (sometimes specialised cameras are used) to obtain basic features in colour models, such as RGB [79], HSV [35,59], and CIELAB [59,79].Among the three colour models, HSV is more popular than the others.For example, [79] collected 618 images from farms in RGB format before being converted to CIELAB colour space and resized to 400*600 pixels.In [59], the authors used two colour models (HSV and CIELAB) for Plant Village data to perform the segmentation for feature extraction.[35] collected 312 samples of tea leaves from three Indian tea gardens and convert them from RGB format to HSV for data pre-processing.From a colour model, we can extract more task-related features based on the spatial structure of the image data.The two most common methods for feature extraction are K-means clustering [59,72,79,93] and grey-level co-occurrence matrix (GLCM) [23,24,60,72,89].From the literature, we found that GLCM features achieves better performance than K-Means features.Other extraction methods from image processing are employed as well.In [35], the authors used Non-dominated Sorting Genetic Algorithm (NSGA-II) to detect the tea leaf's disease area and then applied Principal Component Analysis (PCA) to extract 5 most significant features for classification.In [65]  Combination of features extracted from different techniques.[40] preprocessed all tomato leaf images through the Gaussian filtering (GF) technique first.After that, they tried to combine two feature extractors which are local binary patterns (LBP) and Scale Invariant Feature Transform (SIFT)

Classifiers
SVM was the most common ML classifier to classify the leaf diseases [23, 24, 35, 59-61, 65, 72, 79, 81, 82, 93].[93] used Linear SVM to detect the grape leaf disease, achieving 88.98% accuracy.However, the linear kernel only works well if the data is linearly separated, which is not the case in many applications.In [59], a study compared three different kernels of SVM (Linear, Polynomial, RBF) on HSV and CIELAB features for Black rot disease classification in grape plan.The result showed that a SVM model with RBF Kernel gained the best accuracy of 94.1%.SVM was reported to be applied successfully to Banana leaf (85% average accuracy) [79], tea leaves [35] (83% average accuracy, 78% F1-score), grape vine disease (97.2% average accuracy) [65].A comparison between SVM and Logistic Regression has been studied in [81] for tomato leave disease classification.The results showed that SVM significantly outperforms Logistic Regression (20% better accuracy) and Random Forest (17% better accuracy).In [82] a more comprehensive comparison has been carried out with 4 competitors (Linear Regression, KNN, SVM, Naïve Bayes and Decision Tree) using 9 different types of features.It also concluded that SVM performs the best on tomato leaf disease diagnosis and severity measurement.A new SVM model was proposed in [24], known as hierarchical SVM, to detect citrus leaf diseases where hierarchical SVM achieved 91.76% accuracy in comparison to 88.24% from traditional SVMs.
Besides SVM, other classifiers can achieve high performance if suitable features are selected.For example, in a small private dataset, the performance of K-Nearest Neighbor (KNN) is 98.56%, which is better than 97.6% from SVM [89].In [60], KNN outperforms SVM when using GLCM features for grape leaf images, achieving 96.66% in comparison to 90% from the latter.For rice leaf disease classification [23], six ML algorithms, including RF, Naïve Bayes, Decision Trees, Logistic Regression, KNN and SVM, are compared.The feature set is a combination of Color Histogram, Hu Moments shape features, and Haralick texture features, which enabled RF to achieve the best performance (97.50% accuracy) on an IoT device (Raspberry Pi).[40] pre-processed all tomato leaf images through the Gaussian filtering (GF) technique firstly.After that, they tried to combine two feature extractors which are local binary patterns (LBP) and Scale Invariant Feature Transform (SIFT) with two ML classifiers which are multilayer perceptron (MLP) and random forest (RF) models to classifier the tomato diseases.They measured the accuracy results of each feature extractor with each classifier, which are SIFT & MLP 92.40%, SIFT & RF 91.20%, LBP & MLP 90.40% and LBP & RF 89.30%.Decision Tree is a simple classifier and can be useful for small datasets with a small number of classes [76].Here, the paper shows that after relabelling the classes from four diseases and 1 healthy label to be a binary class, containing 'healthy' and 'unhealthy' labels, Decision Tree can achieve 96% accuracy.

Take-home Messages
1. Shallow machine learning requires feature extraction from images [21] to be useful for the disease classification task.The two most common methods are K-means clustering and grey-level co-occurrence matrix (GLCM), in which GLCM is more recommended.A combination of features is also encouraged, as it can help improve performance.2. Support vector machine (SVM) was the most common ML method for leaf disease classification.It is very suitable for both smaller (more likely to be linear) or non-linear datasets [63].Its better performance in comparison to other classifiers is evident in several studies.However, if suitable features are selected, KNN or RF can achieve better accuracy.3.For small datasets with a small set of disease classes, simple methods can achieve good results.

Deep Learning
Deep learning is a rising branch of machine learning which consists of different architectures and associated learning algorithms.For leaf disease classification, most deep learning models and algorithms are based on neural networks with many number of hidden layers.We categorise deep learning approaches for this task into deep neural networks, convolutional neural networks for image classification, and convolutional neural networks for object detection& classification.Table 5 provides a summary of recent Deep Learning approaches for leaf disease classification.

Image Classification CNNs
CNN is a class of neural networks where spatial information from image structure are represented and learned through convolution operations.CNNs have been used largely in image processing and computer vision, especially in classifying images, and therefore have been useful for leaf disease classification as well.Off-the-self CNNs There are a plethora of convolutional neural networks developed to tackle a wide range of problems in image classification.Ones can easily pick up a model and apply it to classify disease from leaf images.
LeNet & GoogLeNet LeNet [99] is one of the earliest convolution CNNs, although it does not have a very deep architecture, its convolution idea is the inspiration for many other deep CNNs models nowadays.In [70], LeNet achieved the lowest accuracy (94.0%) compared to other approaches on Plant Village.A newer version, called GoogLeNet (also known as Inception V1), was developed with improvements from LeNet with several novel components added, such as batch normalization, image distortions, and more layers.In [2] GoogLeNet achieved 95.69% accuracy and ranked 3rd in 7 CNN models for Apple disease classification.In [91] it achieved 98.9% accuracy for the classification of Maize leaf diseases.
VGG. Very Deep Convolutional Networks, known as VGG or VGGNet, is an idea of how to effectively increase the depth of CNNs.VGG-16 (VGG with 16 layers) has been applied to tomato leaves datasets [84,85].In [84] a pretrained model was used to achieve 77.2%.In [85] a better training approach was proposed where the performance was much higher with 90.1% accuracy.A deeper version of VGG, VGG-19, was employed in [80] to successfully classify tomato leaf diseases with 96.86% accuracy.In [39] used VGG-16 to do the severity analysis The proposed model gained 91.22% Accuracy.[22] applied VGG-16 and VGG-19 on a citrus leaf disease dataset.Notably, VGG-16 has been applied widely to grape leaf images [62,63,88].[63] tested VGG-16 on their private grape leaf diseases dataset (5 leaf diseases and 1 healthy category,6000 images).Some modifications of VGG16 have been developed by replacing two last two fully connected layers with the Global Average Pooling layer.The results showed that the proposed has the best accuracy (98.4%), significantly better than normal VGG-16 and the combination of VGG-16 and SVM classifier.
Inception Inception is a class of CNNs that utilises Inception modules for deeper structure with more efficient computation.In leaf disease classification, Inception V3 was the most popular among different versions of Inception networks.It was employed for tomato leaf diseases [84].In [29] Inception V3 achieved 95.41% on a rice diseases image dataset, better than VGG-16 and RestNet-50.For the benchmark Plant Village dataset, InceptionV3 was reported to receive 98.42% [41], and 99.74%, [43].Again, they have different results because of the different partitions for training, validation, and test.
ResNet Among many deep CNN models, ResNet is a powerful structure where we can train the model with a lot of layers to gain performance superiority.ResNet-50 achieved 98.40% accuracy for tomato leaves [85].[28] applied ResNet to achieve 82.78% in modified Plant Village.For Betelvine leaf disease, [68] [2].Recent works integrate the idea of residual blocks in ResNet and Inception module [41] to create InceptionResNetV2.Such a combination increases the performance from 98.42% to 99.11% on the Plant Village dataset.
MobileNet & EfficientNet.Besides very deep models as we discussed above, some compact architectures were also employed, thanks to the increasing demand for IoT and hardware devices in plant pathology.For example, MobileNet can predict grape leaf diseases with 86% accuracy [62].In [73], MobileNet was applied to predict diseases from Cassava leaves [90].This public dataset has 1 healthy and 5 disease classes and was split into a training set (5,656 images), a validation set (1,889 images) and a test set (1,885 images).All images are resized to 224 * 224 pixels.The proposed MobileNet model gained 85.38% accuracy.In [84] MobileNet was shown to achieve 63.75% on tomato leaf images.In [31], the authors employed three sub-models (B0, B4, B7) of EfficientNet to classify tomato leaf diseases (Plant Village's 10 tomato categories).There are three types of this study's classification tasks, binary classification (healthy or unhealthy), six-class classification (1 healthy and 9 diseased categories are categorized into 5 classes, i.e., bacterial, fungal, viral, mold, and mite disease) and ten-class classification (1 healthy and 9 diseased).All images were resized to 224 × 224 and data augmentation was applied.The evaluation was carried out with 5-fold cross-validation.The results showed that for binary classification and six-class classification, EfficientNet-B7 had the best performance with an accuracy of 99.95% and 99.12%, respectively.For the ten-class classification, EfficientNet-B4 performed better than other models with an accuracy of 99.89%.Custom CNN.Although off-the-shelf CNN models were shown to be useful for leaf disease classification, they were originally designed and tested for general image classification tasks using benchmarking datasets, much different from leaf images.Therefore, they may not be optimal for this specific task and custom CNN models can be best for each dataset.Many researchers customised and developed their own CNN models, either from scratch or modify from existing ones.In [84], a new CNN model was developed to classify tomato leaf diseases (extracted from Plant Village).They compared the proposed CNN model with Mobilenet, VGG-16 and InceptionV3.The proposed model's accuracy is 91.2%, better than the others, and its storage space is the smallest (1,696 KB).[92] also studied tomato leaves from Plant Village.They used the CNN model with Learning Vector Quantization (LVQ) algorithm to classify the diseases.The model achieved 86% average accuracy.Another variant of CNNs was proposed in [66] to classify two tea leaf diseases.The precision of this model was approximately 95.93%.In [78], the authors designed a new Multi Convolutional Layered-based CNN model and apply it to three sub-datasets (Peach, Pepper, and Strawberry) from Plant Village.They showed that their CNN can effectively classify the leaves of three sub-datasets with accuracy from 87.47% to 99.25%.The CNN model in [85] was developed based on Xception V4 architecture and was tested to compare with several common pre-trained models, including VGG-16, ResNet-50, AlexNet and LeNet.The dataset used in this study is 10 classes of tomato leaves from Plant Village, where 14528 images were split into 80% for training and 20% for testing.The experiment results (in accuracy score) are: the proposed model (99.45%),AlexNet (90.1%),Lenet (88.3%),Resnet (98.40%) and VGG-16 (90.1%).[62] tested Vanilla CNN and three pre-trained models (VGG-16, MobileNet & AlexNet).Finally, they built an ensemble model (average voting method) which achieve perfect accuracy of 100%.
A stacking approach was developed in [28], aiming to create an effective way to improve classification accuracy.The dataset in this work is from AI-Challenger 2018 (which was modified from Plant Village), it contains 10 different plant species and 61 classes.They split the dataset into a training set (31718 images) and a test set (4540 images).After data augmentation, the training set has been trained by four models (Inception Network, ResNet, Inception Combine ResNet and DenseNet), and being stacked.The stacking method achieved 87% accuracy, better than ResNet (82.78%),Inception Net (82.22%),DenseNet (83.44%) and Inception-ResNet (84.07%).
Another idea is to employ a hybrid approach, between deep learning and shallow learning, where deep learning would play a role of a feature extractor [70].In this work, AlexNet was combined with Linear SVM to classify diseases in the Plant Village dataset (resized to 227 × 227 pixels).The experimental results showed that their proposed model gained 99.98% accuracy better than the basic AlexNet (96.34%) and AlexNet with Global Average Pooling Layer (97.29%).In addition, they evaluated different optimizers (AdaMax, AdaDelta, Adam, RMS Prop, SGD, AdaGrad) and showed that AdaMax has the best performance in this study.

Object Detection & Classification CNNs
In real-life scenarios, it would be useful if a system can detect leaves from cluttered backgrounds and classify their diseases.In this case, image segmentation can be applied as a first stage to extract the leaves area before applying CNNs for image classification as we discussed in the previous section.However, it would be more convenient to have an end-to-end approach where CNNs can detect leaves and identify diseases.In [55], the authors employed a Faster Region-based CNN (R-CNN) model to detect and classify grape leaf disease with the best accuracy of 81.1%.Faster R-CNN was also the interesting model in [57] for an evaluation of the PlantDoc dataset.They claimed that finetuning Faster R-CNN with InceptionResnetV2 and MobileNet can reduce the classification error significantly.[39] proposed a model based on Faster R-CNN to detect tea leaf blight (TLB) and used VGG-16 to do the severity analysis.The dataset of disease classification has 398 images.Among them, 80 made up the test set.The dataset of severity analysis contains 270 mildly diseased leaf images (after augmentation, it increased to 700) in the training set and 100 in the test set, 700 Severe diseased leaf images in the training set and 100 in the test set.The proposed model gained 91.22% accuracy.[69] studied another variant of R-CNN, namely Mask R-CNN.They improved Masked-RCNN with ResNet50 and Feature Pyramid Network as key components, to classify Betelvine leaf diseases.For evaluation, a private dataset was collected from real cultivated Betelvine crops containing two diseases which are Anthracnose (358 images) and Phytophthora (456 images), and 1 healthy category (200 images).All images are resized to 256 * 256 pixels.The proposed Mask-RCNN model achieved 84.07%F1-score, which is better than Faster-RCNN (74.32%) and the original Mask-RCNN (83.11%).Early applications of deep learning attempted to integrate deep models with feature extraction.For example, in [38] and [67], the authors employed hand-crafted features for image segmentation before training CNNs to classify the tomato leaf diseases.In particular, [38] employed k-means clustering for feature extraction, coupled with CNNs to estimate disease severity, although their results are not clearly detailed.In [67] Discrete Wavelet Transform (DWT) and grey-level co-occurrence matrix (GLCM) features were used to segment leaves from the background which helped a CNN model to achieve 98.12% accuracy, better than AlexNet (95.75%) and traditional (shallow) neural networks (92.94%).

Comparison between DL and SL
Comparisons between SL and DL methods have been carried out largely in recent years.When applying them on the same datasets the performance of DL methods tends to be superior.Deep learning approaches, such as CNNs, are very effective in image classification where abundant data is available as CNNs can extract discriminative features from images automatically.Therefore, the descriptiveness of feature extractors used in shallow learning can be a bottleneck for classifying leave diseases from images.We show the details of the current comparison in Table 6.[22] compared the performance of SVM, RF, Stochastic Gradient Descent (SGD), Inception-V3, VGG-16 and VGG-19 on the citrus leaf disease dataset.Using 10-fold cross-validation, 3 deep learning methods were shown better than the shallow counterpart.A study in [83] compared logistic regression(LR), KNN, and SVM with CNN on the Plant village dataset.The shallow learning methods in this work used K-means clustering as the feature extractor.The experimental results demonstrated that CNN got an overwhelming victory (98% accuracy) compared to other ML methods (around 60%).A deeper study has been shown in [32] where the authors analysed the weaknesses of several shallow learning methods, including K-Means, (shallow) artificial neural networks (ANN), Naïve Bayes, SVM, and KNN.For the empirical results, K-Means and ANN have quite low accuracy, and Naïve Bayes has a slow convergence rate.Meanwhile, SVM achieves relatively poor performance and KNN has some dimensionality issues.The such analysis led to an investigation into a system based on CNNs to improve the performance.As expected, the proposed CNN achieved the best accuracy (96%).[88] used general data augmentation methods i.e. zooming, inversion, flipping, rotation, to make the training free from bias for any particular class (a.k.a balancing data).In this work, the CNN model also achieved the best accuracy of 99%.This is better than other pre-trained models they tested (AlexNet: 86.5%, VGG-16: 97.5%), and also other shallow learning approaches (Decision Tree, Naive Bayes, SVM, LDA, KNN, LR and RF).Among the shallow learning models, RF with HSV-histogram feature achieved the best result (97.5%).The proposed CNN model in [87] can classify leave diseases with 97.87% accuracy, better than the popular transfer learning approaches (AlexNet, VGG-16, Inception-v3 and ResNet) and shallow learning approaches (SVM, logistic regression, decision tree and K-NN).In another work [68], the authors employed Residual Networks (ResNet34) to construct a custom model with 99.40% accuracy and 96.51% F1-score.This results significantly surpass shallow learning models: SVM (50.69% & 50.57%),Decision Tree (72.23% & 72.02%), Logistic Regression (80.99% & 80.88%) and K-NN (87.86% & 88.06%).[2] used their proposed approach (integrating CNN with AlexNet and GoogLeNet cascade inception) to classify apple leaf diseases.Their proposed model gained 97.62% better than shallow learning, including SVM (68.73%) and Back Propagation (54.63%).
From multiple studies on the comparison between shallow learning and deep learning, some researchers concluded that compared with the shallow learning approaches the deep learning approaches, based on CNN architecture, can be more suitable and effective for leaf disease classification [83].As we can see, CNNs do not require manual pre-processing or feature extraction which may cause side effects [33,80], although it can shorten the training time and fewer computations for shallow learning.Table 6 clearly shows that CNNs outperform shallow learning by a significant margin.However, if the data is small, shallow learning can be more useful [91].In order to make deep learning effective, the quantity of data should be sufficient.In the next section, we will show how augmentation has been emerging as a great tool to deal with the data availability problem.

Take-home Messages
• Deep learning models are useful for leaf disease classification and should be recommended in real-life applications due to their high accuracy.The common off-the-shelf deep learning models are CNN, AlexNet, VGG-16, ResNet, EfficientNet, Inception and MobileNet.• Custom CNNs are highly encouraged as we should design an optimal model for different tasks.It was evident that custom CNNs perform better than off-the-shelf models.• Deep learning is more effective than shallow learning in leaf disease classification.It is also more convenient as we can get rid of the feature extraction steps and minimise the manual effort for data processing.• Compare with Table 4, we can see that the datasets used in deep learning papers were relatively larger than in other studies.This is consistent with the fact that deep learning models are usually data-hungry.• Most of the studies focus on the performance (accuracy) aspect of the task while a more comprehensive comparison with compactness and efficiency is still missing.There are a few papers that addressed these issues, for example, [83] evaluates models' speed and [84] evaluates models' storage space.• Different studies use different experiment settings, including different partitions for training/validation/test which makes their results difficult to compare.Therefore, a benchmarking study is needed.
Besides texture augmentation, researchers also used colour augmentation to process the leaf images, such as Brightness, contrast, saturation, hue [7,11], and Principal Component Analysis (PCA) colour augmentation [87].It is worth noting that there may be pitfalls to the use of colour augmentation techniques for leaf images as colour is important to identify diseases.Therefore, we should be careful not to destroy or alter the original features of the leaf images.For example, some researchers used colour augmentation methods to change colourful leaf images [11,87,91], but in [36] the authors pointed out that colour may be one of the most important manifestations of some leaf diseases, so changing the colour features of original images may bring negative effects.
The augmentation methods mentioned above may have limitations such as poor quality, inadequate diversity, and unevenness [36].Recent approaches, including Generative Adversarial Networks (GAN) [102], employ deep learning to generate artificial data.GAN techniques employ a neural networks called generator to produce images which are different from a training set to fool a classifier (a discriminator) as if they belong to some classes of the set.In the case of leaf images, GAN can generate new images for different disease types.Compared to the non-learning methods, GAN-based Data Augmentation is based on generative modelling and learning where the focus is on creating artificial samples and retaining similar characteristics from the original dataset.
GAN has been widely used to create more samples recently [36].In [16], the original dataset comprises a total of 3941 images, including 1858 images of bacterial blight and 1706 images of leaf blast.After applying GAN augmentation, the dataset size increased to 9101 images, with 3767 images representing bacterial blight and 5034 images representing leaf blast, and the experimental results showed that the accuracy of CNN models can be improved with data generated from GAN.
Besides the texture/colour-based transformation and GAN approaches, there are some new methods were developed.For example, [46] proposed two image augmentation (IA) methods, including image pre-processing & transformation algorithm (IPTA) and image masking & REC-based hybrid segmentation algorithm (IMHSA).The methods aim to produce a sufficient quantity of training leaf disease images to improve the richness of small datasets.IPTA is an adaptive supervised learning approach to transform the original images into augmented images.IMHSA is an unsupervised approach for RGB image segmentation.The empirical study showed that with augmented data the validation accuracy was raised from 65% to 73%.

Model Augmentation (Transfer learning)
Transfer Learning (TL) is a technique in machine learning that allows models trained on one task to be adapted to perform another task.It also is a method to augment a learning model by reusing the knowledge learned from other domains for different (but related tasks).This could be useful in leaf disease classification, as models trained on one type of plant could potentially be adapted to work on other plants.There are many related works in this direction, including domain adaptation and multi-task learning, however, in most practice, we can employ pre-trained models which are firstly trained from a huge, public dataset (e.g., ImageNet dataset) for other tasks, then deploy them on the target leaf disease dataset (e.g., Plant Village).In [58], the authors showed that through transfer learning the training time of CNN models can be shortened significantly.This idea has been deployed and studied widely in leaf disease classification.Table 8 lists the recent work about transfer learning methods in leaf disease classification.
A study in [80] adopted several pre-trained deep learning models, including MobileNetV2, EfficientNetB0 and VGG-19, to classify tomato leaf diseases (1 healthy and 9 diseased classes).From the experimental results (MobileNetV2: 97.26% accuracy, EfficientNet-B0: 98.6% accuracy, VGG-19: 96.86% accuracy), they claimed that transfer learning has several advantages: smaller size models, less computational costs, and suitable on the mobile devices.In [58], the authors utilised a pre-trained VGG-16 and fine-tune their collected grape and apple leaves dataset.The model achieved 97.87% accuracy, showing that through transfer learning CNN models' performance and efficiency can be improved.Another work in [64] pointed out that one leaf may contain multiple leaf diseases in real life, thus, the authors used montage images 1 See Table 3 pre-trained VGG-16, ResNet50 and InceptionV3 to classify rice leaf diseases.The dataset contains 3 leaf diseases and 1 healthy categories (resized to 224 * 224 pixels).Each class of the training set has 1000 images and each class of the test set has 300 images.Finally, the fine-tuned VGG-16, ResNet50 and InceptionV3 (with different hyper-parameters) achieved 87.08%, 93.41% and 95.41% accuracy, respectively.[30] deployed pre-trained GoogLeNet and VGG-16 for tomato leaf disease classification with accuracy of 99.23% (GoogLeNet) and 98.00% (VGG-16).A similar study can be seen in [71] where the authors transferred a pre-trained VGG-16 to classify tomato leaf diseases.They tested several types of VGG-16, including (i) a fresh VGG-16 (training from scratch); (ii) a classic transfer learning VGG-16 pre-trained on ImageNet; (iii) a pretrained VGG-16 with incorporated dropout and L2 regularization; and (iv) a pre-trained VGG-16 with dropout and an attention module.In the results, they claimed that the (iv) version with dropout operation and an attention module can effectively improve the accuracy and reduce validation loss, better than other versions.The proposed model in [33] is based on pre-trained ResNet50.Only its last layer was fine-tuned and a Global average pooling layer was added with two 512-neuron dense layers on top.The result of this model, 98% F1score, shows the advantage of transfer learning.[77] presented a pre-trained ResNet-50 with a data augmentation method to detect and classify 6 categories of tomato leaf diseases (Plant Village).The dataset was increased by four times through data augmentation.They showed that their proposed ResNet-50 model's accuracy achieved 97% after fine-tuning the transferred model.In [41] the authors transferred common pre-trained models InceptionV3, Inception-ResnetV2, MobileNetV2, and EfficientNetB0 with depthwise separable CNN method to classify diseases in entire images of Plant Village dataset.The input size was set as 224 * 224 pixels.And they split the dataset into three test set types which are 20%, 30% and 40%.Compare with other models, Efficient-NetB0 gained the best accuracy of 99.56% on the test set.They observed that different split types have little impact on this study.Using a smaller subset (5 types of crops from Plant Village) [43] tested fine-tuning MobileNet and InceptionV3 models.In this work, the leaf images were all processed by the segmentation method, and the two models achieved 99.62% accuracy and 99.74% accuracy, respectively.

Take-home Messages
1.Both data and model augmentation can help improve the performance and robustness of machine learning approaches for leaf disease classification.More attention can be seen in transfer learning where pre-trained models can be reused and augment the learning on leaf images.2. Although data augmentation can be useful some researchers are skeptical about its effect.This is because some data augmentation methods (e.g., random cropping, colour transformation) can change the semantics of original images, which may create misleading images and reduce the performance of classification models [104].3.More attention is being paid to transfer learning, as can be seen in table 8 are satisfactory.This is reasonable as there are abundant pre-trained models on image data available for public use.4.There can be promising ideas for combining data augmentation and model augmentation.However, this study has not been addressed properly.We would encourage more studies in this direction.

Applications
In this section, we will review different leaf disease classification applications, from prototyping/lab-based products to commercialised software.We categorise the applications into: Web-based apps, Mobile apps, and Devices & Hardware.

Web-based Apps
Website-based applications are always the first choice of industry or researchers because it is easy to use and not limited to hardware configuration.The user could submit a picture from a computer or a mobile phone, which was captured by a camera, to get predicted results in real time.Several examples of web-based apps are shown in Table 9.For example, Plant Disease Identifier (https://cropify.herokuapp.com/) is a website to provide tomato and potato leaf disease classification.A user only needs to choose a picture of the leaf to submit then will get the predicted result shortly.A rice disease classification system can be deployed on a website and WhatsApp (See Figure 5b).This system can diagnose three diseases of rice (based on a CNN model), and identify the severity of the diseased area (percentage, based on image segmentation).The dataset used here is the HCI Rice Leaf Diseases Dataset which contains 136 images of three rice diseases.The accuracy of this system is 85.7% [8].

Mobile Apps
In recent years, mobile apps became more popular.Mobile apps can bring better user interface and user experience with the development and popularity of smartphones.
There are some examples of mobile apps for leaf disease classification from the industry.CropsAI is an iOS mobile app which can predict the common leaf diseases of 5 species (Corn, Wheat, Tomato, Soybeans & Rice).Plants Disease Identification is a popular iOS mobile app with a price of $2.99 on the App Store.Agrio is another mobile app which supports both Android and iOS.It claimed to have an AI-based alert system (needs remote sensors) that will notify the subscribed users and provide written preventative measures when detecting or expecting diseases or pests.Plantix is an Android mobile app which can classify leaf diseases of 30 main crops.It could provide instant disease classification and treatment advice.Notably, Plantix can have the largest online farmers and agricultural specialists community in the world [105].Users of Plantix could gain and share knowledge and help each other.Leaf Doctor was a mobile app created by the University of Hawaii, only available on the iOS system.Leaf Doctor supports leaf disease classification and provides disease severity estimation (See Figures 5i, 5j & 5k).The limitation of the mobile app is the software may be limited to smartphone systems and configuration.If a smartphone has a low configuration or outdated system, it will not work properly or will run the software slowly.
From the research community, both [74] and [7] designed a mobile app for leaf disease classification.The app in [7] can classify tomato leaf diseases.Its training dataset was from tomato leaves of Plant Village and the prediction model was based on CNN.They showed that their app could achieve 97% accuracy.Differently, the mobile app in [74] can provide disease classification and real-time field factors monitoring (e.g., temperature, humidity, moisture) (See Figure 5a).It was based on a CNN model which was trained on part of the Plant Village dataset.The authors demonstrated that their app can achieve 87.43% accuracy on leaf disease prediction.

Devices & Hardware
Devices or custom hardware are always required by professional agricultural specialists or researchers because the specific hardware can support more computing power and more reliable performance.In [9] a study pointed out that existing deep learning approaches would need high processing power and may not be suitable for low-budget mobile devices.However, the high configuration will require more capital investment and professional technical capability requirements and training.We show some examples from research as follows.
In [75] a robotic vehicle was designed and developed (See Figure 5c) to detect Basil/Tulsi leaf diseases.Its components include a microcontroller, Bluetooth module, camera module and remote computer system.In the image detection module of the system, they used K-Means Clustering and SVM Classifier through MATLAB software.Users could get the prediction result from the software interface (See Figure 5d).In [4] a novel framework (named IoT_FBFN) was proposed.This framework is based on Fuzzy Based Function Network (FBFN) with IoT technology.It can capture real-time leaf images through the Raspberry Pi camera and transmit them to the system through the internet for FBFN network to classify diseases.They trained the system using a dataset of about 470 trees planted alongside the road in India.They demonstrated that rhe proposed system can achieve 80.66% average specificity and 80.18% average sensitivity, better than K-means and SVM.A handheld device (Embedded Platform) system was developed in [3] (See Figure 5f.With this handheld device, the classification accuracy rate can reach 96.88%.The device will first detect leaves using a camera then divide the image and localise the leaves through data annotation and MobileNet.This module was trained on 338 leaf images they collected, 52 images online and 111 images from Plant Village.Finally, a custom CNN was used to classify diseases.This CNN was trained on 20 categories of Plant Village (apple, corn, potato & Tomato).The system has a certain robustness capability against various conditions (e.g., weather, illumination & background).An interesting device, named Smart Glass, was developed in [9].This wearable device can be more convenient than the handheld devices mentioned previously.It was based on a Raspberry Pi Zero W and can identify whether the leaf is healthy or not in real-time (See Figure 5e).The classification module used in Smart Glass is a transfer learning approach with YOLOv3 + CNN architecture fine-tuned on 304 tomato leaf images from farms (split into two categories: healthy and unhealthy).The proposed model can achieve an average accuracy of 82.38%.
Besides hand-held and wearable devices, Unmanned Arial Vehicles (UAVs) are attracting more attention [20,54].UAVs have great potential in agriculture in the future.In [20], a team designed a drone (quadcopter DJI Phantom 3) with pre-trained EfficientNetV2-B4 to detect leaf diseases.The classification module was trained on Plant Village and achieved near-perfect accuracy of 99.99%.In the industry, American company Agremo started using drones to detect leaf diseases and weeds in sugar beet farms.Drones are especially suitable for continuous inspection and work on large-scale farms.They alleged their drones can provide plant counts, location data of certain weeds and diseases, or irrigation problem identification (water stress).The data of drones collected could produce data visualization easily for farmers analysing leaf diseases, weeds, water issues and so on.

Take-home Messages
1.A wide range of apps and devices have been built using machine learning techniques (mostly deep learning).2. Mobile apps are becoming more popular than web apps for individual users thanks to their compactness and mobility.Meanwhile, UAVs (drones) have potential in large-scale farming.Some prototypes of hand-held and wearable devices were tested but they are not ready for commercialisation.

Conclusions
Despite machine learning techniques have been widely used in leaf disease classification, to our best knowledge, a comprehensive and up-to-date survey which can cover related available data, techniques and applications is still desired by the industry and research community.Therefore, in this paper, we surveyed about 100 recent related articles, collected and listed a series of public datasets which can be researched, analysed state-of-the-art machine learning approaches (i.e., shallow learning, deep learning & augmented learning) and reviewed feasible applications in academia and industry.We have the following findings.In the data part, Maize Leaf (NLB) dataset could be the largest public dataset of single plant species at present while Plant Village is the most popular dataset.Plant Village, Plant Leaves and Plantae K are all laboratory datasets which can be useful for prototyping and evaluating machine learning models.However, real-field datasets, including PlantDoc would provide a more comprehensive evaluation and support for realistic applications.For technologies, shallow machine learning requires feature extraction from images [21] to be useful for the disease classification task.The two most common methods are K-means clustering and grey-level co-occurrence matrix (GLCM), in which GLCM is more recommended.A combination of features is also encouraged, as it can help improve performance.Support vector machine (SVM) was the most common method for leaf disease classification in shallow machine learning.It is very suitable for both smaller (more likely to be linear) or non-linear datasets [63].Its better performance in comparison to other classifiers is evident in several studies.However, if suitable features are selected, KNN or RF also can achieve better accuracy.Relative to shallow learning, Deep learning models have been proven useful and more effective than shallow learning for leaf disease classification which should be recommended in real-life applications due to their high accuracy.It is also more convenient as we can get rid of the feature extraction steps and minimise the manual effort for data processing.The common off-the-shelf deep learning models are CNN, AlexNet, VGG-16, ResNet, EfficientNet, Inception and MobileNet.Custom CNNs are highly encouraged as we should design an optimal model for different tasks.It was evident that custom CNNs perform better than off-the-shelf models.We can see that the datasets used in deep learning papers were relatively larger than in other studies.This is consistent with the fact that deep learning models are usually data-hungry.Most of the studies focus on the performance (accuracy) aspect of the task while a more comprehensive comparison with compactness and efficiency is still missing.There are a few papers that addressed these issues, for example, [83] evaluates models' speed and [84] evaluates models' storage space.Recent research proved that both data and model augmentation methods can help improve the performance and robustness of deep learning for leaf disease classification.More attention is on transfer learning where pre-trained models can be reused and augment the learning on leaf images.Although data augmentation can be useful some researchers are sceptical about its effect.The reason may be some data augmentation methods (e.g., random cropping, colour transformation) can change the semantics of original images, which may create misleading images and reduce the performance of classification models [104].The popularity of transfer learning is reasonable as there are abundant pre-trained models on image data (e.g., ImageNet) available for public use now.For applications, section 6 showed that a wide range of applications (software) and devices (hardware) have been built using machine learning techniques (mostly deep learning).Mobile applications are becoming more popular than web apps for individual users thanks to their compactness and mobility.Meanwhile, UAVs (drones) have advantages and potential in large-scale farming.Some prototypes of hand-held and wearable devices were tested but they may not be ready for commercialisation.Last but certainly not least is the explainability of Machine Learning methods.With the increasing adoption of Machine Learning in the agriculture industry, there arises a pressing demand for models to be transparent and explainable.This may be important for enabling farmers to understand the decision-making process and trust this new technology method.Based on the above findings, we have the following suggestions.
1.The available datasets listed are useful for domain-adaptation and multi-task learning, however, this is largely missing in the current literature.2. A machine learning model should learn from different datasets in a compositional manner where the model can effectively adapt to new tasks/datasets added in. 3.For small datasets with a small set of disease classes, simple methods may achieve good results.4.Many studies use different experiment settings, including different partitions for training/validation/test which makes their results difficult to compare.Therefore, a benchmarking study is needed and encouraged.
5. The research on explainability in this area remains worth attention, as the industry still requires a means to effectively explain decision-making by Machine Learning models to enable user understanding.6.There can be a promising idea of combining data augmentation and model augmentation.However, this study has not been addressed properly.

Fig. 1 :
Fig. 1: The Amount & Years of Referenced Articles.The red color indicates the number of review paper on leaf disease classification.

Table 2 :
Recent Review Papers

Table 3 :
Public Leaf Disease Datasets

This dataset is considered as the largest open dataset of single plant species at present, and will be helpful for maize disease classification and severity assessment. Citrus Dataset. The Citrus dataset has two folders, 150 images of citrus fruits and 609 images of citrus Leaves, each folder has 5 categories (black
spot, canker, greening, melanose, and healthy).All images were annotated by experts.Rice Diseases Image Dataset.Rice Diseases Image Dataset has four categories of rice leaves: Brown Spot (523 images), Healthy (1488 images), Hispa (565 images) and Leaf Blast (779 images).The dataset has been studied in several works All images were collected by 200 farmers through small phones and annotated the labels by experts.The dataset has two parts, one is a training set (9,436 annotated images) and another is a test set (12,595 unlabeled images).
They can be combined into a larger dataset.JMuBEN has three categories: 7682 Cerscospora images, 8337 rust images and 6572 Phoma images.JMuBEN2 has two categories: 16,979 Miner

Table 4 :
Machine Learning Technologies [34] neural networks are neural networks with multiple hidden layers, one on top of another.Previously, training such deep structures is difficult due to the problem of gradient vanishing/exploding but current learning techniques can turn that cure into a blessing, thanks to the availability of big data and powerful computing resources.We can use deep neural nets as a classifier, similar to shallow learning approaches.In[34], deep Belief networks (DBN) were studied, together with other variants of multi-layer feedforward neural networks, for pepper leaf disease classification.The models were evaluated on two datasets.The first dataset is self-collected, consisting of 1500 images of healthy and diseased leaves.The other dataset contains 300 healthy and 35 diseased images from Plant Village.All samples are resized to 256 *256 pixels.The features used in this study was Gray Level Co-occurrence Matrix (GLCM).The average accuracy and F1-score of DBN are 91.956% and 0.77546, respectively.The results are slightly better performance.The employment of feature engineering in deep learning seems not useful, as deep models themselves are effective feature extractors.Instead of two stages (feature extraction + classification) deep convolutional neural networks (CNN) can learn discriminative features that are useful for classification in an end-toend manner.

Table 5 :
Summary of deep learning approaches.

Table 6 :
Comparison Between ML & DL

Table 9 :
Various Applications