1 Introduction

Tomato (Solanum lycopersicum) is one of the most grown vegetables in the world. The yearly global yield of this plant reaches 160 million tons, and it is the second most-consumed vegetable in the world [1]. However, tomatoes are very vulnerable to fungal, bacterial, and viral diseases such as late blight, powdery mildew, or early blight [2]. Some of these diseases, like Alternaria solani, could reduce yield by 80% under favorable conditions for the pathogen [3, 4]. Farmers counteract the effects of the diseases in various ways e.g., by fungicide control or planting plants partially resistant to diseases. However, a key element of the disease counteract process is a correct diagnosis. Currently, the standard diagnosis of these diseases is based on a microscopic assessment of the plant tissue damage and assessing the number of pathogen spores. Therefore, the non-invasive, reliable, and fast method of disease recognition is indispensable in modern agriculture [5].

The use of spectroscopy, along with digital signal processing algorithms and artificial intelligence, seems to be increasingly popular among researchers [5,6,7]. Spectral data are used on a large scale in remote sensing thanks to space agencies, such as NASA or ESA. The data are useful in, among others, crisis management, climate change monitoring, and agriculture. In the last one, the useful applications are indicators [8] that enable the determination of plant vegetation [8, 9], water stress [10, 75], and soil moisture [11, 12] or another soil properties [13].

The present study is a continuation of the authors’ previous work [14] in which they discussed the possibility of early detection of several tomato pathogens using spectroscopy. As an extension, we further developed the machine learning models aiming to increase method precision. To be specific, in contrast to the previous work, three main goals were set for the presented work, which should be considered:

  • Goal 1: finding the best classifier allowing one to distinguish diseased plants from healthy ones based on spectroscopy in 6 experiments variants:

    • CS (control samples) versus objects infected by AN—Anthracnose (Colletotrichum coccodes),

    • CS versus objects infected by BS - Bacterial speck (Pseudomonas syringae),

    • CS versus objects infected by EB - early blight (Alternaria solani),

    • CS versus objects infected by LB - late blight (Phytophthora infestans),

    • CS versus objects infected by SL - Septoria Leaf Spot (Septoria lycopersici),

    • CS versus randomly selected measurements of infected objects (AN, BS, EB, LB, SL).

  • Goal 2: data processing method development, i.e., creating a data processing procedure that achieves the highest improvement in binary classification results,

  • Goal 3: hypothesis verification that selected spectral bands are especially important in the disease analysis process.

Data used for this study consists of 3877 spectra of leaf reflectance taken by ASD FieldSpec 4 Hi-Res spectroradiometer in the spectral range 350–2500 nm and metadata for each measurement such as disease, date of measurement, date of infection, etc. The dataset contained control samples (CS) and infected samples (AN, BS, EB, LB, SL) at various stages of infection, from the early stage where symptoms are not visible by the naked eye to the late stage, which is a novelty compared to the researches conducted so far.

There are a number of mature machine learning algorithms developed that could be employed for investigating the spectral information for the purpose of classification. In Sect. 4 we enumerate the methods that are currently used for a number of pathogen detection applications, including linear models, tree-based methods, ensemble machine learning, and neural networks.

2 Diagnostic methods for solanum lycopersicum diseases

The issue of detecting plant diseases attracts the interest of many researchers, which is reflected in numerous publications [5, 9, 15,16,17,18]. Among other studies, one could find a research targeted to pathogens on Asian soybean[19], wheat infected with Fusarium head blight [20], sugar beet[21] or even for phenotyping disease resistance of crops[22]. A significant part of the researches concerns tomatoes crops [14, 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40].

Conventional disease diagnosis methods involve the observation of disease symptoms. However, the symptoms of the various diseases can be markedly similar to each other, which makes it difficult even for experienced farmers to correctly diagnose the disease. The detailed examination requires, therefore, the collection of plant material and laboratory tests [41]. In the case of fungal diseases, the pathogen has to be isolated and cultured in vitro. The identification is confirmed using microscopic techniques by a taxonomist based on spore morphology and conidiogenesis [41]. Biochemical tests can determine bacterial diseases, while viruses are identified based on genetic material, transmission assays, and their host range [42].

Invasive methods, including enzyme-linked immunosorbent assay, polymerase chain reaction, and immunofluorescent assay, are imperfect in terms of cost, speed, and accuracy matter, while near-infrared spectroscopy and hyperspectral imaging allow for an equally good, cost-effective, and non-invasive plant diagnostics [43], not limited to tomato cultivation, with reported applications to, i.e. potato [44] or rice [45, 46], as well as other plant related phenomena, such as chlorophyll modeling [47, 48] or plant phenotyping [49, 50].

Some authors note that the strongly developing agrotechnical and food processing industries require the development of non-invasive diagnostic methods [43, 51]. Digital cameras [31,32,33,34, 52,53,54], multispectral imaging [30, 55, 56] and spectroscopy [26,27,28, 35, 37, 57] were used for the researches on that issue.

Attempts to diagnose crops and identify Solanum lycopersicum diseases on the basis of spectral measurements were made by, among others:

  • P1 - Zhang et al. [24];

  • P2 - Wang et al. [36];

  • P3 - Jones et al. [38];

  • P4 - Moghadam P. et al. [26];

  • P5 - Xie C. et al. [35];

  • P6 - Lu J. et al. [37].

  • P7 - Pereira J. et al. [40]

In the mentioned publications, various spectral measurements were used, from ultraviolet (UV) to visual light range (VIS) and near-infrared (NIR) to short-wave-infrared (SWIR). In publications P1 and P2 a spectroradiometer GER that works in the 400–2500 nm spectral range was used. In P3 a spectroradiometer Cary 500 measuring within 200–2500 nm spectral range was employed. A set consisting of 2 hyperspectral Headwall cameras, imaging in 400–1000 nm and 900–2500 nm ranges, was used in P4. In P5 a hyperspectral camera, imaging within 380–1023 nm spectral range, was used. Results presented in P6 were based on spectroradiometer measurements in the 400–1050 nm spectral range.

A synthetic summary of the characteristics of the conducted measurements, research, and results is presented in Table 1.

Table 1 A synthetic summary of the characteristics of the conducted measurements, research and results

Dataset used in publications P1 and P2 consisted of spectral data of 60 plants and 1074 measurement points. Additionally, 195 spectroscopic images from the AVIRIS programme, covering 700–1130nm wavelengths, were used in P2. For P1, the average values for six spectral ranges were calculated (600–690 nm, 750–930 nm, 950–1030 nm, 1040–1130 nm, 1450–1850 nm, 2000–2400 nm), and novel spectral disease indices were proposed. The indices were quotients of two selected ranges. The result of the P1 publication was the discovery of the most relevant wavelengths in a matter of late blight detection on tomatoes, while P2 focused on the creation of a backpropagation artificial neural network for resolving the same issue. Authors of the P1 indicate the most important spectral ranges (750–930 nm in the first place, then 950–1030 nm and 1040–1130 nm) for the process of Late Blight detection in tomato crops. The researchers noted five specified, narrow wavelengths characterized by a high reflectance difference for diseased and healthy plants (625 nm, 850 nm, 1050 nm, 1500 nm and 2100 nm). Based on that observation, they developed novel vegetation indices (VI). Authors of the P2 indicate spectral ranges to distinguish healthy and infected objects—750–1350 nm for field measurements and 700–1105nm for AVIRIS imaging spectroscopy. The study was conducted based on the plants with symptoms visible by the bare eye.

In the P3 study analysis of a correlation coefficient spectrum, analysis of a B-matrix from partial least squares regression (PLS) and stepwise multiple linear regression procedure (SMLR) was used to diagnose bacterial leaf spot of tomato. The studied dataset consisted of 156 spectra. Authors of the P3 publication noted a significant correlation between disease severity and spectral absorbance in several wavelengths: 384nm, 626nm, 691nm, and 761nm. A spectral area of 750–760nm was identified as meaningful in all used approaches mentioned above. Additionally, researchers indicated wavelengths 395nm, 400nm, 630nm, and 633–635nm as significant.

The authors of the P4 publication implemented three various approaches to the issue of detecting diseased and healthy plants. A goal set by researchers in this particular study was to distinguish control plants from those infected by tomato spotted wilt virus (TSWV) by using hyperspectral imaging of 400–2500nm spectral range. The very first approach applied in the study was a feature extraction method based on existing vegetation indices (12 various indices). In the second approach, the whole available spectral range was used. In the third approach, probabilistic topic modeling (PTM) was used. PTM is a method usually used in natural language processing (NLP) for determining the topic of the document. The researchers treated the hyperspectral image as a document, and measurement values for each wavelength were treated as a word in the document.

The dataset used in P4 consisted of imaging of 30 diseased and 30 control plants of 1 variety of tomato. Plants were observed for 21 days. During this time, six measurement sessions were taken from the 3rd to the 21st day after inoculation. During the sessions, 133 diseased leaves and 103 control leaves were measured. The Kullback-Leibler divergence, also known as relative entropy, was used in the study to estimate the distance between the measurement distribution of diseased and control objects.

The best results were achieved using the full wavelength range. The method based on full SWIR wavelength attained an \(F_1\) score equal to 0.92. For VNIR, the \(F_1\) score reached 0.94. However, all analyzed approaches obtained an \(F_1\) score equal to or higher than 0.8. The authors pointed out that no specified wavelength is the most significant for disease classification issues.

The study described in P5 was focused on using spectroradiometer measurements in the 380–1023 nm spectral range to train a multiclassification model for Early Blight, Late Blight and control plants. The study’s main concentration was on wavelengths 442, 508, 573, 696, and 715 nm. To achieve this, 310 measurements were used—120 measurements of healthy plants, 120–early blight, and 70 infected by late blight. The obtained accuracy of the classification ranged from 97.1 to 100%.

The P6 publication covers the study of 57 spectral vegetation indices (SVIs). Based on the conducted studies, the researchers indicated the four most significant SVIs, which refer to 6 considerable wavelengths, i.e., 445, 450, 690, 707, 1070, and 1200nm. The 445 measurements were used in the study, and the aim of the work was the classification model to classify four various classes—control (74 measurements), infected but asymptotic plants (77 measurements), diseased plants with visible symptoms in an early stage of disease (148 measurements) and diseased plants with visible symptoms in a late stage of disease (146 measurements). The obtained accuracy reached 100% for control, asymptotic and late-stage classes.

The authors of the P7 report 15 additional bands between 455 and 666 nm as the most relevant compared to the presented results. However, all of the wavelengths mentioned in the P7 are in the visible spectrum 434.9\(-\)680.02 nm, as the authors focused on this specific spectral range and attribute it to the absorption of chlorophylls (430 to 480 nm and 640 to 700 nm), carotenoid pigments (450 to 480 nm and 600 to 650 nm), and xanthophylls (520 to 580 nm). For the diseases studied (bacterial speck, bacterial spot) the most important wavelengths were found to be in the blue-green and red VIS regions of the electromagnetic spectrum.

3 Materials and methods

The following part of the manuscript aims to detail the investigation procedure. The whole research involved four main components: the plant experiments, the spectroscopy acquisitions, the development of the machine learning models and the drawing of the necessary conclusions. This chapter provides the required information on how this procedure was carried out and how it could be reproduced.

3.1 The experiment description

The ASD FieldSpec 4 Hi-Res spectroradiometer was used to perform the acquisitions. The device measures reflectance, transmission, radiance, or irradiance of the tested sample in the spectral range of 350–2500 nm with a spectral resolution of 3 nm and 8 nm for 700 nm, 1400 nm, and 2100 nm, respectively. The spectral sampling width was 1.4 nm for the UV-VNIR range (350–1000 nm) and 1.1 nm for the VNIR-SWIR range (1001–2500 nm). This device is built of 3 detectors: one 512-element silicon near infrared sensor (VNIR: 350–1000 [nm]) and two InGaAs photodiode-based, 2 Stage TE Cooled Graded detectors (SWIR\(_1\): 1000–1800 [nm] and SWIR\(_2\): 1800–2500 [nm]). This device is characterized by its decent performance, including the wavelength reproducibility of 0.1 [nm], the wavelength accuracy of 0.5 [nm] for the average error of wavelength calibration fit, the wavelength accuracy of ±1 [nm] for any one line, and the Noise Equivalent Radiance of: 1.0\(\times 10^{-9}\)[W/cm\(^2\)/nm/sr] 1.4\(\times 10^{-9}\)[W/cm\(^2\)/nm/sr] 2.2\(\times 10^{-9}\)[W/cm\(^2\)/nm/sr], for detectors: VNIR at 700[nm], SWIR\(_1\) at 1400[nm] and SWIR\(_2\) at 2100[nm] respectively. In this study, reflectance measurements were used.

The measurements, the basis for the analyzed data set, were made from September 10, 2019 until December 20, 2019. Leaves were first removed from the investigated plants, which at that time had between 30 and 40 leaves, and immediately taken to the measuring station to avoid deterioration. This also means that the following measurements on different days were taken from different leaves, but were taken from the same plants, previously inoculated, and treated in the same way in separate vegetative chambers for each pathogen. Measurements were performed in laboratory conditions in vegetation chambers (phytotrons). Figure 1 shows the time of sowing (light green), planting (green), and infecting plants (purple), as well as the period of taking measurements on individual phytotrons (blue).

Two varieties of tomatoes were used in the experiment: Benito and Polfast. Six test plants were prepared for each of the varieties. Plants were infected with five different pathogens identified in the introduction to this article. The additional surplus was a control plant that was not treated with any pathogen. The test was carried out in 3 measurement cycles. The inoculation (infection) in the first cycle took place on September 10, 2019, in the second cycle—on November 12, 2019, and in the third cycle—on December 9, 2019. The first symptoms of infection, visible to humans, were visible 3–5 days after the infection.

Fig. 1
figure 1

Experiment Gantt chart presenting: sowing, planting and infecting days on a timeline and taking samples; every strip represents one measurement day

An artificial light source in the form of two identical halogen lamps was used to illuminate the tested objects properly. On two sides we set up two Ushio Eurostar Reflekto MR16 bulbs of the color temperature of 3000 [K], with narrow \(12^{\circ }\), spot beam spread, and 11,000 [cd] luminous intensity.

A spectroradiometer calibration is required before every measurement session. The calibration consists of measuring white reference and dark current, where white reference refers to an object with nearly 100% reflectance and dark current refers to the current generated within a detector in the absence of any external photons. Spectralon reference panel made of polytetrafluoroethylene (PTFE) and sintered halon was used as the reference white. Its reflectance in the 400–1500 nm range is over 99% and over 95% in the range of 250–2500 nm [58].

A single reflectance measurement is a result of averaging five successive spectral curves. The measurements were made from three distances of the measuring instrument from the infected leaves: 5, 30, and 60 cm. The analysis of the obtained measurements made at different heights showed that the reflectance differs depending on the height of the measurement. The light scattering and the spectrometer optical fiber parameters cause the reflectance measurements of the same Solanum lycopersicum leaves taken from greater distances (30 and 60 cm) to be subject to significant external disturbances. We found at the later stage of the project, that the accuracy of leaf targeting and the repeatability of scans were higher when the instrument was placed 5 cm above the sample. The resulting observed area diameter is 2.2 cm and that aids precise leaf targeting. The characteristics of various scan heights were also highly distinct and could hinder machine learning modeling. Therefore measurements taken at the height of 5 cm from the tested object were used (Fig. 2).

Fig. 2
figure 2

Spectrum of all measurements normalized for a height of 5 cm (control measurements—green, infected objects—red)

3.2 Structure of the dataset

The initial data processing consisted of isolating incorrect data, such as incorrect calibration measurements and objects that were not the subject of the project (e.g., background of the tested object) from the correctly performed measurements. Then the metadata encoded in the measurement reference numbers was extracted and added to the data set, creating a model matrix (tidy dataset) [59].

As a result, the number of primary measurements was reduced from 72,156 to 58,186 by screening out the measurement errors, then to 11,634 by determining the medians of the measurement series. Error detection was based on a straightforward visual inspection of the spectroscopy scans performed, filtering out the obvious outliers and out-of-range spectral curves from the dataset. Finally, the measurements made from a height of 5 cm were selected as the least noisy with the background image and at the same time, the most reliable in the research process, which resulted in obtaining 3877 reliable, fully described with metadata, unique measurements. The whole process of the experiment and data collection has been depicted in Fig. 3.

This study’s calculations and data processing have been done using Python-based open-source software (licensed under GPLv3), applying Sci-Kit Learn, NumPy, SciPy, Pandas, Seaborn, and Matplotlib scientific libraries.

Fig. 3
figure 3

Diagram of the measurement data recording process (from measurements to a tidy dataset)

4 Classification using machine learning algorithms and their evaluation

Several studies reported promising results for plant disease symptoms identification, mainly for images, and the situation when the symptoms are rather visible, working on images using convolutional neural networks for tomato [60,61,62,63,64] as well as for other plants such as potato [65] cucumber [66] or rice [67] (this study exploited also thermal imagery to improve modeling performance, for detecting any visual disease symptoms that cannot be detected externally). Some researchers, similarly to the presented study, explored the potential of hyperspectral information towards the detection of both symptomatic and non-symptomatic cases, of single pathogen detection in this case [68]. Some extended the investigation testing also the possibility of delivering the segmentation algorithms for pathogens on the visually monitored tomato fields [69, 70].

There are numerous different machine learning techniques that have been tested useful for plant disease classification. Even linear methods, like ridge classifier (RC) [71], logistic regression (LR) and linear support vector classification (linear SVC) could provide a sufficient model for some reported cases [28]. The more complex ensemble machine learning techniques, with the ability to combine several weaker sub-models that are advised in the number of papers [72,73,74]. Those are, among others random forest (RF) [28, 73, 74], modified RF [75] light gradient boosting machine (LGBM) [72, 76] and extreme gradient boosting classifier (XGB) [74]. Some of the classifiers were tested on data in a similar fashion, collected from tomato samples, and after applying the ensemble machine learning models, concluded with auspicious results [28, 73]. Therefore, the set of the recalled models has been selected for further investigation in this study.

The range of machine learning methods for detecting plant diseases is quite extensive. Researchers have employed a variety of methods, including disease indices (P1[24]), linear modeling (P3[38]), probabilistic approaches (P4[26]), and nearest neighbor classifiers (P5[35]). These methods provide a useful set of tools for interpreting results later on. For certain investigative purposes, it may be necessary to employ more complex architectures when the searched patterns are not easily identifiable. These architectures may include neural networks (P2[36]), support vector machines[77] and convolutional neural networks[78] that support the discovery of more abstract pattern representations.

To efficiently conduct the experiment and evaluate the resulting models’ performance the cross-validation procedure should be applied [79, 80]. In order to compare the performance of the prepared models, we decided to investigate the \(F_1\) metric, which allows us to indirectly check the precision and recall results, and is denoted in the Eq. 3. For results calculated using the validation set, let TP be a number of true positives, or correctly indicated infected samples, TN be a number of true negatives, or correctly indicated not infected samples, FP be a number of false positives, or incorrectly classified not infected samples, and FN be a number of false negatives, that is infected samples missed the examined classifier. And let:

$$\begin{aligned} Precision = \frac{TP}{TP + FP} \end{aligned}$$
(1)

and let:

$$\begin{aligned} Recall = \frac{TP}{TP + FN} \end{aligned}$$
(2)

than the \(F_1\) metric that gathers all the necessary information of both those metrics could be denoted as:

$$\begin{aligned} F_1 = 2\cdot \frac{Precision \cdot Recall}{Precision + Recall} = \frac{2 TP}{2 \cdot TP + FP + FN} \nonumber \\ \end{aligned}$$
(3)

Such an approach provides more detailed information about model performance, than simple accuracy that takes into account only the number of correct predictions divided by the total number of predictions and does not reveal the whole case complexity. The usage of such a metric for pathogen classification evaluation was proposed also in [81].

5 Results

The investigation described had two main objectives, therefore the results section provides two separate parts to support both. The first is concerned with disease classification and the second focuses on the search for spectral bands that are relevant to specific diseases.

Initial classification trials were performed in order to select machine learning algorithms for further research. The best results were obtained by the following classifiers: LGBM—light gradient boosting machine, linear SVC—linear support vector classification, LR—logistic regression, RF—random forest, RC—ridge classifier, XGB—extreme gradient boosting. These algorithms were used in the actual in-depth study.

In order to increase the effectiveness of the classification of Solanum lycopersicum cultivation diseases, a step-by-step approach to the problem under consideration was chosen. The first classification [STEP 1, Fig. 4] was performed for the base set (measurements obtained with a spectroradiometer without additional processing). In the next step, the set was normalized (to achieve this we performed the min-max normalization, and stretched the data in the range of [0, 1]), and the classification was performed again using previously selected algorithms [STEP 2, Fig. 4].

Fig. 4
figure 4

Diagram of the subsequent stages of the research experiment

Then, because the distributor of measuring equipment indicated a high, difficult to determine measurement uncertainty in the initial (350–475 nm) and final (2400–2500 nm) range of the measurement spectrum, it was decided to remove this range from the data set, and the classification process was repeated [STEP 3, Fig. 4]. We have chosen not to examine these two narrow spectral regions at either end of the device spectra obtained with our spectrometer. We identify these two parts of the measurements as weaker. The readings in the first spectral region reflect the characteristics of the halogen light source used for this study, which has lower emission in this UV to blue part of the spectrum. At the other end of the measurement, the process is weaker due to the characteristic of the spectrometer employed, which uses an InGaAs sensor. Therefore, in order to avoid providing data with a lower signal-to-noise ratio, we omit these two extreme parts of the measurements. As a consequence, we consider the omission of these spectra as a limitation of this study. Nevertheless, in the following investigation we plan to address them, including additional blue LED array illuminations that have been reported helpful[82] as well as additional calibration targets included in the process[83].

The next step was to limit the feature space in the classification process. For this purpose, the Recursive Feature Elimination (RFE) method was used with the overall goal of reducing the feature space to 50 spectral bands at the end of STEP 4 and, if necessary, with a deeper reduction in STEP 5. We therefore applied two stages of the elimination process. The first feature elimination stage [STEP 4, Fig. 4] exploited the RFE to limit the number of features to 50 bands with recursive elimination of 50 features in each subsequent iteration. As a result, 50 spectral features (bands) were selected for later inclusion in the final classifier, for which the \(F_1\) measure was calculated.

The second stage [STEP 5, Fig. 4] applied the dynamic elimination of the features, with the difference from STEP 4, that the \(F_1\) score measure was determined after the elimination of each data feature. Based on the \(F_1\) score measure, the best elimination stage was determined for each of the classification cases considered. The step of 50 features for each iteration of the process was chosen arbitrarily at first, as it both provided the expected performance improvements and did not significantly slow down the entire process. We later also tested other options for this parameter, but for shorter steps, the number of required computations increased substantially, slowing down the computations substantially, and for larger step sizes, we noted the accuracy loss. Therefore, we stayed with the 50 features step.

During this process, the cross-validation method was used. We performed data splitting for cross-validation with respect to the plant diseases studied. We aimed to distribute the samples for each disease and control group evenly between the subsets. This also meant that the vegetative chambers were equally represented, as the plants inoculated with different diseases were planted separately. Secondly, we also stratified the data samples collected in the following days across the subsets. However, we restricted the placement of scans of the same leaf in the same fold to avoid later processing very similar scans in the same algorithm training phase.

The last stage [STEP 6, Fig. 4] was to fine-tune the hyperparameters of the individual classifiers for the best step from STEP 5 by applying the method of randomly searching for optimal hyperparameters using cross-validation. To address the disease detection task, we applied several well-established machine learning algorithms that fulfill two main criteria: they allow achieving high performance and support model interpretability by exploring the importance of features. An essential step in the development of these detectors was the search for their advantageous hyperparameter configuration, therefore we tested the following parameters for the following models.

  • LGBM: with learning rates of the range from \(10^{-6}\) to \(10^{-1}\), minimal number of required samples per trained tree leaf between 5 and 100 with step 5, and with the maximum allowed depth of each trained tree from 2 to 52 with step 5,

  • Linear SVC: with regularisation parameter C between 1 and 1000 and with \(L_1\) or \(L_2\) regularisation,

  • LR: with regularisation parameter C between 1 and 1000 and with \(L_1\) or \(L_2\) regularisation,

  • RF: with the maximum number of submodels included from 200 to 2000 with step 100, with the maximum allowed depth of each trained tree from 2 to 52 with step 5, with the minimum required number of samples at each tree leaf node between 1 and 16, and minimum number of samples needed to perform a node split between 2 and 10,

  • RC: with learning rates of the range from \(10^{-4}\) to \(10^0\),

  • XGB: with learning rates of the range from \(10^{-4}\) to \(10^-1\), with allowed percentage of samples to be used per each tree between 1/5 and 4/5 with step 1/5, with the maximum allowed depth of each trained tree from 2 to 52 with step 5, and with the maximum number of submodels included from 200 to 2000 with step 100.

Table 2 Training results for selected classifiers at various study steps for individual or all disease data combined (ALL). The resulting \(F_1\) scores for the following steps are listed in columns (\(F_1^{res_0}\), ..., \(F_1^{res_5}\))

The results presented below correspond to the activities performed at each stage. Results for all the evaluated models are presented in Table 2) and are respectively:

  • result 0—tidy dataset,

  • result 1—tidy dataset and min-max normalization,

  • result 2—tidy dataset and elimination of marginal ranges of the reflectance spectrum,

  • result 3—tidy dataset and min-max normalization and result for the full recursive feature elimination process,

  • result 4—tidy dataset and min-max normalization and the best result obtained in the process of recursive feature elimination,

  • result 5—tidy dataset and min-max normalization and the best result obtained in the process of recursive feature elimination and fine-tuning of the classifier hyperparameters.

The ALL column presents the binary classification results (\(F_1\) score) distinguishing healthy plants from sick plants, regardless of the disease (CS vs AN + BS + EB + LB + SL). The following columns are the results of the classification of individual diseases in relation to the control sample (CS vs AN, CS vs BS, CS vs EB, CS vs LB, CS vs SL, respectively).

All used classifiers were trained on the same training dataset (2/3 of all measurements; n = 854 samples) and tested on a separate test dataset (1/3 of collected measurements; n = 420 samples) that did not participate in the learning process. Training and test sets for all experiment variants have been balanced, in that sense they were built of 50% of samples of infected plants and 50% of the control group.

Fig. 5
figure 5

The importance of particular ranges of the reflectance spectrum in the process of disease classification. The white dashed line marks the overlapping spectral ranges for all the analyzed diseases

For the part of the process where we are working on sub-models for individual infections (i.e. CS vs. AN, and so on), the detection is based only on data for that pathogen and the control set. Therefore, at this stage, a smaller subset of samples is used sequentially, using only data for specific infections and control data at the same time.

In almost each of the experiment stages, the classification performance improved. Only STEP 3 (elimination of the marginal ranges of the reflectance spectrum) resulted in worse scores. For the ALL classification (CS vs. AN + BS + EB + LB + SL) the outcome of the classification was 0.879.

The detection of BS, EB, and LB diseases was characterized by slightly lower effects (0.872, 0.877, and 0.866, respectively), while the remaining diseases were detected with greater efficiency (AN: 0.894, SL: 0.896). After performing the entire procedure described, including the hyperparameter tuning, the best result improvements were observed for logistic regression (in 4 cases) and linear SVC (in 2 cases). The smallest improvement in the classification efficiency was obtained in the case of AN, and it was 0.084. On the other hand, the greatest improvement of 0.144 was achieved in the case of BS.

6 Discusion

Another goal of the study was to determine the participation of individual spectral bands (model features) in the decision-making process. Depending on the used classifier, feature weights (“coef_” attribute) or feature significance coefficients (“feature_importances_” attribute) were defined. The coefficients mentioned above can be treated as indicators of the significance of features for such classifiers as inter alia, linear regression, or ridge classifier. The determined feature weights were used for the RidgeClassifier, linear SVC and logistic regression classifiers, while the feature significance coefficients were used for the Random Forest, XGBClassifier, and LGBMClassifier classifiers.

The recursive feature elimination (RFE) method was used in the conducted study, using the levels mentioned earlier of the significance of features and feature coefficients. The method is based on the iterative elimination of successive, least significant features of the data set. In the case of the considered data set, these were data columns representing successive lengths of the reflectance spectrum. Fifty features that showed the least significance in the decision-making process in each subsequent iteration were removed from the set. Then the classifier was retrained on a smaller data set. The elimination process ended when the set was limited to 50 traits. Depending on the data set under consideration, the whole process consisted of 39 or 44 elimination steps. Each feature was assigned a ranking index that allowed it to determine the feature’s elimination stage. For each of the stages, the value of the \(F_1\) score measure was also determined for the test data set.

Based on the \(F_1\) score measure, the step for which the highest value of the indicator was obtained was determined. In this way, all the bands involved in the decision-making process were listed. Then it was determined how often particular features were taken into account by a given classifier. On this basis, a graph of the share of spectral features in the decision-making process was created.

As part of an in-depth analysis, it was decided to group the adjacent spectral ranges (50 bands) and to define the most frequently repeated spectral ranges (70% of the most rarely occurring bands were removed).

On the basis of the conducted research, the following spectrum ranges for the analyzed diseases were obtained (the numbers provided indicate the measures of the determined ranges of the reflectance spectrum with a width of around 43 [nm]):

  • AN: 371, 413, 455, 708, 751, 793, 835, 961, 1004, 1046, 1088, 1805, 1847, 1889, 2438 [nm],

  • BS: 371, 413, 455, 666, 708, 751, 793, 835, 961, 1004, 1847, 1889, 1932, 2395, 2438 [nm],

  • EB: 371, 413, 455, 666, 708, 751, 793, 835, 919, 961, 1004, 1046, 1847, 2269, 2438 [nm],

  • LB: 371, 413, 455, 666, 708, 751, 793, 835, 961, 1004, 1046, 1847, 1889, 2395, 2438 [nm],

  • SL: 371, 455, 708, 751, 793, 835, 919, 961, 1004, 805, 1847, 1889, 1974, 2142, 2269, 2438 [nm],

  • ALL: 371, 413, 455, 498, 666, 708, 751, 793, 835, 919, 961, 1004, 1847, 2395, 2438 [nm].

The above-mentioned spectral reflectance ranges are illustrated in Fig. 5.

As shown in Fig. 5, several ranges of the reflectance spectrum cover all analyzed diseases and the case where the control was compared with a pooled set of samples of all diseases. These are the following ranges: 371, 455, 708, 751, 793, 835, 961, 1004, 1847, 2438 [nm]. The conducted study also indicated spectral ranges specific for each of the analyzed diseases, and the relevant summary is presented in Tab. 3.

The conducted study indicated several specific ranges of the reflectance spectrum, including:

  • Spectral bands influential in the classification of three or four examined diseases (413, 666, 1046, 1889 [nm]),

  • Bands characteristic of two examined diseases: EB, SL (919, 2269 [nm]), AN, SL (1805 [nm]), BS, LB (2395 [nm]),

  • Spectral ranges specific for one analyzed disease: 1088 [nm] (AN), 1932 [nm] (BS), 1974 and 2142 [nm] (SL).

Table 3 The specific identified reflectance spectrum ranges for the classification of studied diseases
Fig. 6
figure 6

Comparative report concerning the spectral bands indicated in the literature review (P1-P6—publications described in the introduction to the article, LB, BS, EB—results of the experiment described in this article)

Additionally, the spectral ranges mentioned in the literature review were compared with the results obtained during the research for LB, BS, and EB diseases. The summary of the results is visualized in Fig. 6.

The spectral band ranges indicated in the literature review were compared with ranges obtained during the described research. The range of visible and near-infrared light appears in the cited publications and the authors’ results. The conducted research also indicated other bands of the reflectance spectrum, significant in the classification of the studied diseases of Solanum lycopersicum, located outside the areas mentioned above. These are, among others:

  • for LB: 1847, 1889, 2395, 2438 [nm] (the P1 publication indicates: 2100 [nm]),

  • for BS: 1847, 1889, 1932, 2395, 2438 [nm],

  • for EB: 1847, 2269, 2438 [nm].

7 Conclusion

The research confirmed that spectroscopy reflectance measurements could be used to detect some Solanum lycopersicum diseases. Regarding objective 1, the best results were obtained for the Logistic Regression and Linear SVC classifiers. The Logistic Regression classifier achieved the highest \(F_1\) score for Septoria Leaf Spot disease, amounting to 0.896 after hyperparameter adjustment.

The procedure proposed in the article (Stage 1–6), the analysis of which was the second main objective of the study, resulted in an improvement in the classification results in almost every analyzed case—for the Bacterial Speck disease, the most significant improvement of the \(F_1\) score metric was obtained (from 0.728 to 0.877).

The performed analysis of the significance of the spectral bands in the classification of diseases (objective no. 3) indicated overlapping reflectance ranges for various diseases (Fig. 6). Additionally, different and specific reflectance ranges were indicated, which were particularly important in classifying specific diseases (Table 2). The indicated ranges can be used to develop new methods and measuring instruments dedicated to diagnosing appropriate Solanum lycopersicum diseases and after further research into other crops’ diagnoses.

The method described in the article can also be applied to other crops and their diseases. However, this requires the development of appropriate, dedicated data sets consisting of measurements made in laboratory or field conditions. Based on spectroscopy, the presented diagnostic method can be attempted to be transferred to aviation platforms (i.e., unmanned aerial vehicles, manned aircraft, satellites) [24]. The conducted literature review and the research described in the article show that collecting the most extensive possible set of measurement data, covering various disease cases, may allow for the creation of quick and possibly reliable methods of plant disease diagnostics. In subsequent scientific and research works, the authors plan to attempt to classify diseased plants regardless of the stage of disease development, i.e., at the stage when symptoms are not visible (analysis of the classification effectiveness in the days following infection). It also seems advisable to use the constructed dataset to analyze the effectiveness of disease classification using various artificial neural network (ANN) models, including Deep Learning.