Introduction

Tomato (Solanum lycopersicum L.) is a garden plant consumed by millions of people worldwide every season. Since tomatoes are grown in moderately dry soil, production is very high and easy. 90% of farmers grow tomatoes on their farms [1]. Tomato, one of the most popular vegetables in people's daily life, has great economic importance for countries [2, 3]. According to the data from the Food and Agriculture Organization of the United Nations (FAO), tomato production is constantly increasing worldwide, and tomato production was 182 million tons in 2017. In 2019, the total production reached 243.6 million tons. The countries that play the most roles in this production are China, India, Turkey and the USA, respectively [4, 5]. Tomato has become one of the main food products in the world, due to its high daily consumption and its rich content of fiber, vitamins, minerals and antioxidants [6]. In addition to their rich nutritional content, tomatoes provide protection against diseases such as hepatitis, hypertension, inflammation and cancer [7]. However, the nutrient content and composition data of tomatoes differ according to the species, genetic and environmental factors [8]. Tomato types are available in a wide variety of sizes, colors and shapes. But in general, tomato varieties are expressed by four species, these are cherry, Italian, salad, and Santa Cruz. Among these species, cherry tomato cultivation is more profitable than others [9]. In this context, studies are being developed to increase the productivity of cherry tomato cultivation [10].

Due to its high production and increasing demand, it has become important to distinguish, package and transport tomato species. Tomatoes have a very sensitive structure to different production, transportation and packaging conditions and if they are damaged, the quality of the product decreases. In addition, the fertilization and inoculation practices during the growing process, as well as the growing region, affect the yield and content of the tomato. In the presence of such different production and growing conditions, nutritional security and successful discrimination of crops have gained importance today [3]. Providing both fast and contactless product discrimination can only be achieved with computerized systems today. Manual sorting of tomatoes according to their physiological ripeness is difficult with human control. This process is time-consuming and expensive, crops can be damaged by impact and erroneous sorting can occur. An automation system to be designed for this will both reduce the cost, increase the speed and accuracy of discrimination and increase the yield and productivity of the crop [11]. In addition, the development of smart agricultural applications depending on computerized systems both enables the development of non-destructive methods and accelerates the economic growth of many countries [12].

Applications based on computer vision and artificial intelligence for automatic discrimination of agricultural products have increased recently. In this context, shape, color and texture features are frequently used for automatic differentiation from crops. In order to perform classification according to these distinctive features with computers, various machine learning algorithms that learn these features are used [12]. Nyalala et al. [6] performed an application based on computer vision and machine learning methods for estimating tomato mass and volume. They made predictions with five different regression methods using 2D and 3D features obtained from depth images of tomatoes. As a result, the Radial Basis Function (RBF)-Support vector machine (SVM) model provided the most successful prediction. El-Bendary et al. [13] proposed a study based on color features for the automatic classification of tomato ripeness stages. They used Principal Components Analysis (PCA) for feature extraction and SVM and Linear Discriminant Analysis (LDA) for classification. In that study, which was applied with tenfold cross-validation, up to 90.80% accuracy was achieved with SVM. Semary et al. [14] performed an application that classifies infected/uninfected tomato fruits according to their color and texture (Gray Level Co-occurrence Matrix (GLCM)) features. The authors used PCA for feature reduction and SVM for classification. At the end of that study, tomatoes were classified as infected and uninfected with 92% accuracy depending on the external surface. Dhakshina Kumar et al. [15] developed a system based on texture (GLCM), shape and color characteristics to classify tomatoes according to their maturity. They also segmented the defects in tomatoes with Gabor wavelet transform. Later, these defective regions were divided into three classes according to their color and geometrical characteristics. SVM was used in the classification stages. Ropelewska et al. [3] proposed a texture-based application for the discrimination of tomatoes based on flesh and skin images. Images from six different tomato species were then converted to R, G, B, L, a, b, X, Y, and Z color channels. Texture features extracted from different color channels were classified by various machine learning algorithms. Finally, Ireri et al. [16] classified tomatoes according to color, texture, and shape. LAB color space was preferred for color features and GLCM was preferred for texture properties. SVM with different kernel functions was used for grading recognition and RBF-SVM was used for defect detection.

The objective of this study was to combine fluorescence spectroscopic data and machine learning algorithms for distinguishing greenhouse tomato samples. According to the selected spectroscopic data, different machine learning algorithms from Meta, Functions, Bayes, Trees, Rules and Lazy groups distinguished greenhouse tomatoes with high success.

The salient contributions of this manuscript are summarized below.

  • Use of fluorescent spectroscopy data for distinguishing tomato cultivars,

  • Using machine learning algorithms for classification,

  • Distinguishing tomato species with high accuracy.

By applying an author-designed mobile fiber-optic configuration using the phenomenon of fluorescence of light, it is possible to create non-invasive methods for field evaluation of tomatoes. So far, there is no data on their characterization by the proposed method. The aim is to validate fluorescence spectroscopy in the proposed configuration as a non-invasive method for the evaluation of two different varieties of greenhouse tomatoes. As a result of the successfully applied research in this study, it is expected that the creation of an interdisciplinary method for tomato analysis will be initiated.

A literature survey was conducted to conduct similar research. It turned out that until now the described experimental approach for tomato analysis has not been applied nationally and internationally. This gives us reason to claim that it is the first time that fluorescence spectroscopy in combination with machine learning has been applied to the analysis of tomatoes in field conditions. This study marks the beginning of these studies and will be of benefit to scientists who are developing their scientific directions in the field of optoelectronics or machine learning in the analysis of vegetable crops.

Materials and methods

Experimental design

For processing from the fluorescence study, ten averaged graphs from two different varieties of greenhouse tomatoes are presented. Graphs are averaged after the 15th measurement of each sample. There were over a thousand spectral data at different wavelengths for a single sample. The samples were measured on site at the farm where they were grown, as the fluorescence signal acquisition scheme is mobile. In this way, the effect of damaging the sample is avoided. The samples were measured immediately after recultivation. The mobile spectral installation (Fig. 1) for the study of fluorescence signals was designed specifically for the rapid analysis of plant biological samples.

Fig. 1
figure 1

Mobile experimental installation used by fluorescence spectroscopy

The mobile experimental installation used by fluorescence spectroscopy contains the following blocks:

  • Laser diode (LED) with an emission radiation of 245 nm with a supply voltage in the range of 3 V. It is housed in a hermetically sealed TO39 metal housing. The emitter has a voltage drop of 1.9 to 2.4 V and a current consumption of 0.02A. The minimum value of their reverse voltage is—6 V.

  • Forming optic, which is a hemispherical lens made of N-BAK2 glass. The post-LED forming optics can defined mainly for the refractive, dispersive and thermo-optical properties, as well as for the transparency in the UV range [240–280 nm].

  • Quartz glass area 4 cm2. Its optical properties are to be transparent to visible light and to ultraviolet rays. This allows it to be free of inhomogeneities that scatter light. Its optical and thermal properties exceed those of other types of glass due to its purity. Light absorption in quartz glasses is weak.

  • CMOS detector with photosensitive area 1.9968 × 1.9968 mm. Its sensitivity ranges from 200 to 1100 nm. Its resolution is δλ = 5. The profile of the detector sensor projections along the X and Y axes is also designed for very small amounts of data, unlike widely used sensors.

The radiation is led from the LED through the forming optics block by means of a quartz fiber. The secondary radiation from the illuminated sample (visible spectrum)—illuminated by the impacting UV radiation is coupled to the CMOS detector by means of light-guide optics. The quartz multimode fiber has a step index of refraction and a numerical aperture of 0.22. In the CMOS detector, the light signal is converted into an electrical–digital signal and, by means of a USB 2.0 wire, it is taken for analysis and downloading of the data to a laptop. The obtained fluorescence spectroscopic data were subjected to statistical analysis involving discriminant analysis to distinguish two different varieties of greenhouse tomato.

Statistical analysis

The samples of greenhouse tomatoes were discriminated with the use of the WEKA machine learning application (Machine Learning Group, University of Waikato) [17,18,19]. The differences in spectroscopic data of greenhouse tomato 1 and greenhouse tomato 2 varieties were analyzed. The flowchart presenting the applied procedure is shown in Fig. 2. After obtaining fluorescence spectroscopic data, the first step of the analysis included the attribute selection performed using the Ranker search method with the OneR Attribute Evaluator. The spectroscopic data with the highest power to discriminate the tomato samples were selected. The discriminative models were built based on selected features using a tenfold cross-validation mode. The machine learning algorithms from the Meta, Functions, Bayes, Trees, Rules and Lazy groups were used. In the case of each group, algorithms providing the most satisfactory discrimination performance metrics were selected. The results were determined as confusion matrices including an accuracy for each sample, average accuracy, time taken to build the model, Kappa statistic, mean absolute error, root mean squared error, and relative absolute error. These performance metrics were computed using the WEKA application.

Fig. 2
figure 2

The flowchart of the stages of distinguishing greenhouse tomato samples using fluorescence spectroscopic data and machine learning algorithms

Results and discussion

The greenhouse tomato samples were completely correctly discriminated for the models developed based on fluorescence spectroscopic data using the following algorithms: Multi-Class Classifier from the group of Meta, Logistic (group of Function), Bayes Net (group of Bayes), PART (group of Rules), and J48 (group of Trees) (Table 1). The average accuracy, as well as accuracies for both greenhouse tomato 1 and greenhouse tomato 2 equal to 100%, were obtained. It meant that all cases belonging to the actual class of greenhouse tomato 1 were correctly classified as greenhouse tomato 1 and all cases from the class of greenhouse tomato 2 were correctly included in the predicted class of greenhouse tomato 2. The values of Kappa statistic equal to 1.0 and mean absolute error, root mean squared error and relative absolute error equal to 0 also indicate a completely correct classification. In the case of Bayes Net, time taken to build the model of 0.02 s was the shortest. Also, models built using other algorithms were characterized by the short time to build them, the longest for Logistic equal to 0.24 s.

Table 1 The results of discrimination of greenhouse tomatoes for models built based on fluorescence spectroscopic data using selected algorithms providing an average accuracy of 100%

For some algorithms from different groups, greenhouse tomato samples were distinguished with an average accuracy of 95% (Table 2). The cases of greenhouse tomato 1 were classified with an accuracy of 100%. Whereas greenhouse tomato 2 samples were correctly discriminated in 90% and the remaining 10% were incorrectly classified as greenhouse tomato 1. These results were obtained for LDA and QDA (Quadratic Discriminant Analysis) from the group of Functions, Naive Bayes from the group of Bayes, Hoeffding Tree from the group of Trees, Filtered Classifier, Logit Boost and Random Committee from the group of Meta, and LWL from the group of Lazy. The high value of Kappa statistic of 0.9 was observed and low values of errors including the mean absolute error of 0.05, root mean squared error of 0.22 and the relative absolute error of 10% were found. The time taken to build the model was in the range of 0.00 s (Naive Bayes, LWL) to 6.42 s (QDA).

Table 2 The performance metrics of discrimination of greenhouse tomatoes for models developed based on fluorescence spectroscopic data using selected algorithms providing an average accuracy of 95%

Slightly lower accuracies of discrimination of greenhouse tomato samples were determined for the models developed using other machine learning algorithms. For example, an average accuracy of 90% was obtained for JRip from the group of Rules and 85% for FLDA (Fisher Linear Discriminant Analysis) from the group of Functions (Table 3). In the case of a model built using the JRip algorithm, both classes were correctly discriminated with an accuracy of 90%. Whereas for the model developed using FLDA, the samples were correctly distinguished from each other in 80% for greenhouse tomato 1 and 90% for greenhouse tomato 2. In the case of using the FLDA algorithm, the value of Kappa statistic of 0.7 was the lowest and mean absolute error of 0.15, root mean squared error of 0.39, and relative absolute error of 30% were the highest.

Table 3 The results of discrimination of greenhouse tomatoes for models built based on fluorescence spectroscopic data using selected algorithms providing an average accuracy of 90 and 85%

The obtained results confirmed the effectiveness of the approach combining fluorescence spectroscopy and machine learning to distinguish greenhouse tomato varieties. The literature data also reported the usefulness of spectroscopy for the classification of tomatoes. Tomatoes belonging to different genotypes were classified using visible and short-wave spectroscopy, least-squares support vector machines (LS-SVM), soft independent modeling of class analogy (SIMCA), discriminant analysis (DA) and discriminant partial least-squares (DPLS) [20]. Additionally, spectroscopy was used to diagnose tomato diseases [21]. Furthermore, spectroscopy, i.e., spatially offset Raman spectroscopy (SORS) or fluorescence spectroscopy can be used for the evaluation of tomato maturity and postharvest ripening during storage [22,23,24,25]. Further studies may focus on the use of deep learning to discriminate tomatoes with a high probability.

Conclusions

Fluorescent spectroscopic data have proven to be highly effective for distinguishing greenhouse tomatoes. Numerous machine learning algorithms distinguished two different tomato varieties with high accuracy according to these data. The most successful discrimination was achieved with the Multi-Class Classifier, Logistic, Bayes Net, PART and J48 models in the Meta, Functions, Bayes, Rules and Trees groups, and all greenhouse tomato species were correctly classified. With other learning algorithms, discrimination accuracies of 95%, 90% and 85% were obtained. These results are quite satisfactory in terms of successful non-destructive and automatic discrimination of greenhouse tomato species. The performed research can be expanded to include more varieties and apply deep learning to discriminant analysis. In addition, the variety of data can be increased by taking images of tomatoes with a camera and adding color features to the fluorescence spectroscopy data. The successful conduct of this research allows for the formulation of interdisciplinary non-invasive diagnostic methods combining fluorescence spectroscopy with multiple machine learning algorithms as rapid application tools in tomato breeding programs. By monitoring the signal intensity, it will be possible to monitor the stability of a breeding line and its common blacks with an established cultivar of the same species. This will allow the crossing of specific genotypes or parental samples, with the aim of obtaining representatives with better indicators.