Introduction

Botanically, fenugreek (Trigonella foenum-graecum L.) is an annual leguminous herb of the family Fabaceae. This species is cultivated in Europe, India, Turkey, China, Canada and Northern Africa [13]. The seeds have a strong aroma, and they are bitter in taste, very hard and difficult to grind [4]. Fenugreek seeds have been commonly used in traditional medicine as a laxative, digestive, and as a remedy for cough and bronchitis. They may also help control cholesterol, triglyceride as well as high blood sugar levels in diabetics. Fenugreek seeds added to cereals and wheat flour or made into gruel, given to nursing mothers, can increase breast milk production. Excess intake of fenugreek seeds by pregnant mothers may put them in risk of premature childbirth [5].

Fenugreek seeds have a high protein content of around 20–30 %, and they are rich in lysine and tryptophan. Fenugreek seeds contain 45–60 % carbohydrates, most of which are a mucilaginous fiber which is 30 % soluble and 20 % insoluble fiber. They also contain a small amount of oils (5–10 %) and pyridine alkaloids (mostly trigonelline), a few flavonoids, free amino acids, sapogenins, vitamins and volatile oils. Fenugreek seeds are a source of minerals such as copper, potassium, calcium, iron, selenium, zinc, manganese and magnesium. Potassium is an important component of cell and body fluids that helps control heart rate and blood pressure by countering the effects of sodium. Iron is essential for red blood cell production [3, 6, 7]. Apart from the chemical composition of seeds, also their physical properties are important in the design of transport and storage equipment: (1) the size and shape of seeds are important for designing of separating, harvesting, sizing and grinding machines; (2) bulk density and porosity affect the structural loads of machines. Our findings can contribute to expanding the existing knowledge in the field, which is an important consideration for both farmers and constructors.

The aim of this study was to develop a statistical model for the classification of fenugreek seeds into homogeneous groups based on their physical properties.

Materials and methods

Materials

The experimental materials were fenugreek seeds. Plants were grown in experimental fields at the Research Station in Tomaszkowo, Poland (53°43′N, 20°24′E), operated by the University of Warmia and Mazury in Olsztyn. The fenugreek seeds came from the field experiments, which were conducted in two seasons. The experiment was established on typical brown soil of quality class IVa with a light loam overlay. The soil was characterized by a slightly acidic pH, moderate content of phosphorus and potassium, and low levels of magnesium. Nitrogen concentration was determined at 1.13 g kg−1 of soil. The following mineral fertilizers were applied: 30 kg N ha−1 (urea), 30.5 kg P ha−1 (46 % granular triple superphosphate) and 83 kg K ha−1 (60 % potassium salt). Harvested seeds were cleaned, then dried to 12 % moisture content (±0.5 %) and transported to the laboratory, where they were stored at 7 °C. The moisture content of seeds was measured repeatedly before the experiment. All of the analyzed physical parameters were determined at 12 % seed moisture content. The experimental set with a division into plots is presented in Table 1.

Table 1 Experimental set containing the division into groups and number of groups

The statistical analysis covered two experimental designs. In the first stage of the analysis, the plots were divided into 10 experimental groups. In the second stage, selected experimental groups were combined into a single data set (Table 1). The first group (B_0) consisted of seeds without Rhizobium inoculation or chemical protection (group 29). It was assumed that unlike in uninoculated seeds (A_0), inoculation (A_1) combined with various types of chemical protection would differentiate the physical parameters of seeds. Cultivation conditions varied with respect to seed inoculation, chemical protection and agricultural measures. Homogeneous groups of fenugreek seeds were identified based on the following variables: thousand seed weight, true density, bulk density and porosity, geometric parameters (linear dimensions, shape factors), color and surface texture.

Image analysis

The image analysis work station consisted of the Epson Perfection 4490 Photo flatbed scanner and a graphics processing unit with the Intel Pentium D 830 processor. SilverFast Epson v 6.4.3 scanning software supported full control of scanned images. The images were analyzed with the use of modified MaZda v 4.7 software [8]. An image segmentation algorithm was developed in the first stage of computer-aided analysis. Fenugreek seeds were spread on a contrasting background to facilitate the determination of the binarization threshold. The binarized image was superimposed onto the original image and scaled with calipers to the spatial resolution of the original image. A set of measured linear dimensions and shape factors was developed according to the method proposed by Zapotoczny [9]. In texture measurements, the region of interest (ROI) covered whole seeds. Images were converted to RGB, Lab* and U, V, S color channels before texture measurements. Images from selected channels are presented in Fig. 1.

Fig. 1
figure 1

Images from individual color channels for selected fenugreek seed. RGB—color space, R—red, G—green, B—blue, U—blue chrominance, Y—luminance, L—lightness, a*—red/green, b*—yellow/blue, V—red chrominance

Statistical analysis

The data set was standardized according to the procedure described by Zielińska et al. [10]. The results were processed in Statistica 12.0 (StatSoft Inc., Tulsa, USA), Matlab Statistics Toolbox (MathWorks Co., USA) and WEKA 3.7.6 (Machine Learning Group at the University of Waikato) applications. Statistical analysis was carried out in several stages, which included the various statistical procedures. Supervised and unsupervised feature selections were carried out in order to reduce the number of variables. The smallest set of variables that ensured the high correct classification was wanted. In the next stage, the various methods of multidimensional analysis were used to search for a statistical model grouping varieties in homogeneous groups. The detailed methods of variable reduction are described in subsection: 2.3.1 Variable selection, whereas the methods of multidimensional analysis—in subsection: 2.3.2 Discriminant analysis.

Variable selection

Variables are selected by identifying a subset of parameters that verify the hypothesis at least as effectively as the original data set. The most popular ranking methods rely on Fisher transformation, probability of misclassification with cumulative correlation coefficient and mutual information. Other ranking techniques include division and reduction algorithms, sequential search algorithms and evolutionary programming paradigms [11]. This study relied on algorithms implemented in the WEKA 3.7.6 application (Machine Learning Group at the University of Waikato): (a) Best First, (b) Rank Search, (c) Genetic Search, (d) Linear Forward Selection and (e) Greedy Stepwise. The applied methods were described in detail by Witten et al. [12]. Feature selection was particularly important in texture measurements because 1900 variables were calculated for each ROI. In the first step, six variables with the highest discriminating power were selected from each channel, which reduced the data set to 30 variables. Every selection method produced a set of repeated textures; therefore, the final data set was reduced once again, and all channels were combined to generate 30 variables for further analyses. The number of variables was not reduced in sets containing the remaining parameters. This approach guaranteed the achievement of the 1:10 ratio of variables to cases, indicating the presence of at least 10 cases per each explanatory (independent) variable. A model composed of color parameters and the remaining physical parameters of seeds was developed in the last stage of discriminant analysis. The above variables were combined into a single data set to improve the classification of experimental groups.

Discriminant analysis

Data are classified by assigning a given category (class) to a new object based on the knowledge derived from the available examples. This process takes place in two steps. In the first step, a classification model is built based on the training set, whereas in the second step, the model is used to predict new cases. The model can be developed in a variety of ways, including decision trees, rough set methods, probability distribution methods, statistical models, support vector machines and neural networks [11]. In this study, the following methods were used to classify fenugreek seeds into homogeneous groups: (a) Naive Bayes, Simple Bays, Bayesian network classifiers, (b) multilayer perceptron, (c) meta multiclass classifier, (d) NBTree classifier, (e) PART rules, (f) Lazy IBk and (g) progressive discriminant analysis, backward stepwise discriminant analysis, forward selection and backward elimination. The cross-validation technique was applied at n = 10. The choice of classifier was determined by the group of variables included in the statistical model. Groups of variables forming a single data set are presented in Table 2.

Table 2 Sets of variables for constructing the models of discrimination

Multidimensional analyses—disjoint sets

The objective of multidimensional analyses was to build a statistical model capable of discriminating individual cases from a given experimental group into a homogeneous group based on selected variables. The models were developed separately based on selected texture variables, geometric parameters, color parameters and the remaining variables describing the physical properties of seeds. After performing preliminary studies, for further analyses, the variables for the season, providing more accurate classifications, were used. Multidimensional analyses evaluated the influence of agrotechnical practices on selected physical properties of seeds across experimental groups. High accuracy of classification would confirm such an influence.

Results and discussion

Image texture

The results of discrimination analyses for experimental groups are presented in Table 3. Discrimination accuracy ranged from 50 to 65 % in the analyzed groups. Such unsatisfactory results (subject to the applied discrimination method, only 20 to 42 % of seeds were correctly discriminated) could be attributed to the production technology where seeds were only treated with herbicide (group 41). Similar results (28–47 %) were noted in group 42. The smallest classification error was observed in group 29 where no treatments were applied, which could suggest that agrotechnical practices influenced the texture of the analyzed seeds. The distribution of cases from backward discriminant analysis is presented in Fig. 2.

Table 3 Multiple discriminant analysis for fenugreek—texture parameters
Fig. 2
figure 2

Distribution of cases from backward discriminant analysis

The data set was divided as follows: plot 29 was a separate set (B_0), whereas the remaining plots (Table 1) were assigned to groups A_0 and A_1 based on the inoculation criterion. Classification accuracy ranged from 53 to 69 %, subject to the applied discrimination method (Table 4). The smallest error of 8 % was noted in group B_0 for Naive Bayes classifiers. This result is consistent with previous findings [13], and it confirms that inoculation and other treatments influenced the physical properties of fenugreek seeds.

Table 4 Multiple discriminant analysis for fenugreek—texture parameters

Selected physical properties

In the next stage of the study, the influence of experimental factors on the physical properties of seeds—porosity, density and thousand seed weight—was analyzed (Table 5). The percentage of seeds correctly classified to a homogeneous group was higher than in the model developed based on image texture variables. The overall accuracy of discrimination ranged from 73 to 82 %. Seeds from the variant that involved seed dressing with Dithane M45, inoculation, sowing delayed by 10 days, 30-cm spacing between rows and chemical weed control (group 16) were classified with 91–96 % accuracy regardless of the applied discrimination method.

Table 5 Multiple discriminant analysis for fenugreek—physical features

The discriminant analysis of the second experimental design revealed that inoculation may produce varied effects. Discrimination accuracy was high, but more than 13 % of cases were incorrectly classified. In group B_0, where seeds were not subjected to any treatments, discrimination accuracy ranged from 75 to 87 % (Table 6).

Table 6 Multiple discriminant analysis for fenugreek—physical features

Color and spectral composition

Color is an important factor in evaluations of food quality. Varieties can be discriminated based on hue and color saturation [14]. Color measurements are performed to determine the influence of temperature on thermally processed products [1518]. This study analyzed the impact of various agricultural production systems on color components. The yellowness index (YI E313) and spectral components in the range of 400–700 nm were also included in the model. Classification results for the first experimental design are presented in Table 7. The most accurately classified group (6) comprised uninoculated seeds subjected to all treatments. Classification accuracy ranged from 87 to 100 %, and satisfactory results were also noted in set 29 which was the control group relative to the remaining experimental groups. The least satisfactory results were observed in group 42 which consisted of seeds that were not inoculated with Rhizobium meliloti, but were dressed and treated with the herbicide and fungicide during the growing season. The above could suggest that agricultural treatments influence the color parameters of fenugreek seeds.

Table 7 Multiple discriminant analysis for fenugreek—color parameters

The results for the second experimental design are presented in Table 8. The classification error did not exceed 10 % in the control group, and it was determined at only 2 % when the Naive Bayes classifier was used.

Table 8 Multiple discriminant analysis for fenugreek—color parameters

Geometric properties

The discrimination of fenugreek seeds involved the determination of their geometric properties. Linear dimensions and shape coefficients were incorporated into the model. Classification accuracy ranged from 18 to 43 %, subject to the applied discrimination method, which could be attributed to significant variation in linear dimensions and shape factors within groups. The genetic stability of physical parameters remains unknown due to lack of varieties of fenugreek.

The small size of fenugreek seeds (seed length ranges from 3.78 to 4.01 mm) could also make it very difficult to evaluate the impact of agrotechnical practices on the geometric properties of seeds. Similar results were noted in the second experimental design where the overall discrimination accuracy reached 44–60 %. In the control group (B_0), 65–73 % of seeds were correctly discriminated only when Bayes classifiers were used, which confirms previous assumptions.

Discriminant analysis—combined sets

The aim of the last stage of the analysis was to verify whether classifier effectiveness can be improved by combining sets of color variables and the remaining physical attributes, excluding geometric and texture parameters, into a single set. It was assumed that the overall influence of variable groups can significantly improve the classification result. The above approach was positively verified by Majumdar [19]. Before analysis, the data set was divided into two separate sets: a testing set and a training set, at the 30:70 ratio. In this approach, the discriminant model is built based on a separate data set, and it is validated on cases that did not influence the discriminant model. The results of classification performed on experimental group seeds in combined sets are presented in Table 9 and Fig. 3.

Table 9 Multiple discriminant analysis for fenugreek—combined sets
Fig. 3
figure 3

Distribution of cases relative to canonical variables

The discussed procedure significantly improved discrimination accuracy, and the error did not exceed 8 % for all applied methods. Experimental groups 6 and 16 were discriminated with 100 % accuracy in all cases.

The highest classification error in the range of 14–21 % was noted for groups 23, 42 and 52, and it was quite satisfactory for the needs of the analysis. In the second experimental design, groups were classified with 70–97 % accuracy. The least satisfactory results were observed for group A_1 where seeds were subjected to inculcation and chemical protection (Table 10).

Table 10 Multiple discriminant analysis for fenugreek—combined sets

The distribution of cases relative to roots 1 and 2 is presented in Fig. 4. Experimental group A_0 was separated by a considerable distance from the control group (B_0), which could imply that one of the applied treatments was responsible for the separation of uninoculated groups.

Fig. 4
figure 4

Distribution of cases from discriminant analysis—backward elimination

The results noted in the first experimental design were subjected to a detailed discriminant analysis to determine variables with the greatest impact on each canonical root and variables that discriminated a given experimental group most accurately. The results of the Chi-squared test for successive roots with partial values of Wilks’ lambda and canonical values are given in Table 11. Eight canonical variables were identified, seven of which were statistically significant. Based on the values of Wilks’ lambda, only four canonical roots were taken into account in further analyses.

Table 11 Chi-square test for the canonical roots—experimental set I

The structure of experimental factors for selected roots is presented in Table 12, the higher the value of the coefficient describing a given variable, the greater its contribution to root formation. The first canonical root was most highly influenced by color variables, including Lab* parameters, yellowness index YI E313 and spectral components. The second root was developed based on the impact on thousand seed weight, bulk density, true density and porosity. The third root was determined mainly by the physical properties of seeds (density, porosity) and selected spectral components. The fourth variable was most highly influenced by YI E313, true density, parameter b*, porosity and spectral components, mostly in the range of 670–680 nm.

Table 12 Factor structure for selected canonical roots

The contribution of each variable to the formation of canonical roots has to be identified to describe every variant’s influence on their discrimination (Table 13).

Table 13 Impact of the experimental groups on the creation of canonical roots

The first canonical variable most effectively discriminated experimental group 6 (10.02) where all chemical protection treatments were applied in the growing season. It was less successful in discriminating groups 22 and 16 where identical chemical treatments were applied. The remaining groups were discriminated with similar accuracy, except for group 52. The first canonical root was most highly influenced by color variables, which implies that the combined effect of inoculation and all chemical treatments induced changes in the color of fenugreek seeds. A similar correlation was observed relative to groups 22 and 16. The second canonical function accurately discriminated groups 16, 22, 29, 52 and 42 from the remaining groups. The second root was determined by thousand seed weight, porosity and density. The third root was most successful in discriminating group 16 seeds, and it had least discriminating power on the control group (29). This study demonstrated that chemical protection alone as well as the combined application of chemical treatments and inoculation influenced the variables responsible for the third root. The fourth root most accurately discriminated uninoculated seeds (groups 42, 52, 41, 6) and group 23.

Conclusions

Statistical models for the classification of fenugreek seeds into experimental groups were developed based on variables representing five data sets: I physical properties, II geometric parameters, III spectral components, IV texture parameters from color channels and V physical properties (I) and geometric parameters (II). The classification of fenugreek seeds based on texture variables (IV) was least accurate (the overall classification error was 35–50 %). The statistical model based on data set II was characterized by higher discriminating accuracy (classification error of 31–47 %), but it was still insufficient to conclude whether cultivation measures led to changes in seed texture. Higher classification accuracy was achieved when the models were developed with the use of data set I variables (bulk density, true density, porosity and thousand seed weight) and data set III variables (color parameters). The model combining sets of physical properties and geometric parameters of seeds (I and II) provided the highest classification accuracy, from over 84 to 100 %. The control group (29) was discriminated with 99–100 % accuracy by all classification models, which confirms that inoculation had no effect on changes in the physical properties and geometric parameters of seeds. The least satisfactory results were observed for experimental group A_1 (inculcation and chemical protection), where classification error reached 30 %. This implies that the influence of inoculation and protection treatment on the physical properties of seeds was ambiguous or that the applied cultivation measures could contribute to incorrect classification in 30 % of cases.