This section presents the dataset collection, experimental setup, experimental results, findings, and discussions. We will also discuss the efficiency of different features with different machine learning models.
Table 4 The number of paddy seeds from different variety in BDRICE and VNRICE dataset Dataset and experimental configurations
In our experiment, we used two different dataset: BDRICE (prepared by us) and VNRICE [19, 27, 41]. The BDRICE consists of 60,800 paddy seed images of four different varieties. The VNRICE [19, 27, 41] is comprised with six paddy varieties. The description of both datasets is given in Table 4. We used 75% data for training the model and 25% for testing purposes.
Experimental setup To evaluate the proposed model’s effectiveness, we conducted experiments on eight different settings. The names and the description of the settings and their corresponding results are described in Table 5.
We also evaluated our proposed system in terms of robustness by conducting experiments on two different settings. In the first setting, we have taken different combinations of paddy varieties. In the second setting, we have changed the ratio of training and testing images. We conducted all the above experiments on the BDRICE dataset, and the best result is applied on the VNRICE dataset to evaluate our system performance with other models.
Machine learning model configuration In this study, we used four different ML models: FNN, SVM, K-nearest neighbor (KNN), and decision tree (DT). We configured FNN with 30 nodes, 3 hidden layer, 1 output layer, tansig activation function in hidden layer, and sigmoid activation function in output layer. In case of SVM, we used multiclass model support vector machine onevsall technique with Gaussian kernel where the learning parameters were matlab default values. Again, we used the configuration for KNN with 4-neighbors, mahalanobis distance, squaredinverse weight, and quadratic loss function. Finally, we configured DT with MaxNumSplits 4, KFold 10, optimizer bayesopt, acquisition function expected-improvement, and repartition status false.
Experimental results
To evaluate the performance of our experiments, we used a tenfold cross-validation technique with four different machine learning models: FNN, SVM, K-nearest neighbor (KNN), and decision tree (DT). The performance was evaluated utilizing four different evaluation metrics: accuracy, precision, recall, and F1_score. Let, we have five classes in our dataset, and then, we get five values for each evaluation metric (e.g., accuracy) in each setup. For brevity, from there, we present only the best result of five values.
Our experimental results are described in Table 5. We observed that the minimum performance achieved in Trad_NM settings, and the maximum performance in Lasso_NM settings where the difference between the accuracy of these two settings is 78.75%. The performance of evaluation metrics indicates that Lasso_NM performing better than all the other techniques. This can also be identified from the results of all the machine learning model’s accuracy, where the accuracy varies from 0.05% to 4%.
We present our confusion matrix for the BDRICE dataset in Fig. 4 using the features obtained from Lasso_NM settings with the FNN model. In the confusion matrix, we use actual and predicted levels for comparison, and then, we calculate different evaluation metrics from the matrix. Finally, we select the best results among all varieties.
Table 5 Performance scores were obtained in different settings on the various evaluation metrics using the BDRICE dataset
Machine learning model selection We conducted experiments to determine the best machine learning model in terms of different evaluation metrics. From Table 5, we can see that the highest performance is achieved on Lasso_NM settings, and applied it to find the best machine learning model. The experimental results are shown in Fig. 5, and we achieved high performance for all the models, where the minimum accuracy is 97.66%, and the maximum one is 99.28%. In terms of different evaluation metrics, we observed that the best and lowest performance achieved on FNN, DT, where the difference of precision and F1_score is 2.58% and 1.79%, respectively.
Necessity of feature selection We experimented to determine the importance of feature selection in machine learning. We used two different settings: features without filtering AF_NM and features selected by Lasso LS_NM (Table 5). From Fig. 6 we see that between these two settings, the recall value varies from 8% to 11% and the accuracy varies from 7% to 9% where LS_NM setting provides a good score in both cases. We also observe that the FNN contributes significantly in terms of accuracy, whereas KNN performs well for recall value. These results reveal that feature selection boosts performance and plays a vital role to develop an appropriate model.
Performance stability on the number of variety We conducted an experiment to visualize the performance effect of changing the number of paddy varieties. First, we obtained the accuracy by considering two varieties termed as 2V_NM which consists of all combinations of two varieties determined by \(_{4}C_{2}\) formula. It means that there are 6 subsets {(BRRI11, BRRI28),(BRRI11, BRRI29),(BRRI11, BRRI81),(BRRI28, BRRI29), (BRRI28, BRRI81), and (BRRI29, BRRI81)}. All the subsets are examined with a tenfold cross-validation technique, and average accuracy is shown in Fig. 7. Similarly, we experimented on three and four varieties termed as 3V_NM and 4V_NM, respectively. For all settings, we used the FNN model to train features. We observed that with the number of varieties, the accuracy changes slightly which implies that if we change the number of paddy varieties, it will not have much effect on the overall performance of the model.
Model robustness on training and testing ratio: We also examined the effect of changing ratio in training and testing datasets. We considered four different settings: 20:80, 40:60, 60:40, and 80:20 termed as 2BY8_NM, 4BY6_NM, 6BY4_NM, and 8BY2_NM respectively, where the numbers represents the training and testing image percentages. We conducted a tenfold cross-validation technique for all varieties using Lasso_NM features with the FNN model and the best result is depicted in Fig. 8. We observed the nominal performance difference between the setting, where the minimum accuracy is 98.76% for 2BY8_NM and maximum accuracy 99.28% for 8BY2_NM settings. To conclude, the number of paddy seeds in training and testing is not influencing remarkably which demonstrates the robustness of the proposed system.
Feature efficiency To evaluate the influences of proposed textural features over traditional features on the model building, we conducted experiments in two settings. The first one includes the traditional features described in Table 1 which is expressed as OTF_NM. In another setting, we considered the textural features of both Haralick and our newly proposed T20-HOG which is denoted by HTF_NM (Table 5). The experiment results are depicted in Fig. 9. We observed that traditional features have less importance rather than textural features and the differences between accuracy and precision on two settings are 64%, and 65%, respectively. In addition, for all evaluation metrics, HTF_NM settings provide very high performance in comparison to OTF_NM settings.
In our experiment, we have taken images of paddy in an open environment, so the resolution and depth of our pictures are different. Due to these reasons, the values of traditional features (height, width, area, perimeter, etc.) have deviated a lot. That can lead to bad results in OTF_NM setting. On the contrary, textural features address these limitations; that is why, HTF_NM (Table 5) setting provides good results.
Dominance of brand new T20-HOG: It is inferred that textural features are very powerful for our research. However, we have two different groups of textural features one is our brand new T20-HOG and the other is existing Haralick features. Therefore, we analyzed in detail to find the impact of each group of textural features. Here, we also considered two different settings. One is building and testing the model with only T20-HOG features, expressed as T20_NM (Table 5), and another setting comprises of Haralick textural features represented by HF_NM (Table 5). The result is shown in Fig. 10. From the graph, we see that the minimum result variation of the two settings is 8.29%, and the maximum is 11.66%. Overall, the performance of T20_NM settings is remarkably high, because T20-HOG features can represent the textural changes effectively and boost up the performance.
Robustness of proposed T20-HOG: We compared the performance of T20-HOG (T20_NM setting) with traditional HOG features (HOG_NM setting), shown in Table 5. The result is depicted in Fig. 11, from this figure we observed that the performance of T20-HOG outperformed the HOG in all four evaluation metrics, which indicates the robustness of our newly proposed T20-HOG features.
Moreover, we experimented by taking T40-HOG, T30-HOG, T20-HOG, and T10-HOG features from sorted HOG count values. Then, we feed these features into FNN to find the accuracy of the BDRICE dataset. Later, we applied the Lasso features selection technique on the feature set and calculated the accuracy again. The detailed result with and without feature selection, no of selected features by Lasso, and selected feature names are presented in Table 6. From the table, we can see that T20-HOG with feature selection provides the best result. Besides, the lowest performance is achieved on T40-HOG without feature selection. Hence, we can conclude that our proposed T20-HOG with lasso feature selection is highly effective for paddy classification.
Table 6 T10-HOG, T20-HOG, T30-HOG and T40-HOG performance comparison with and without Lasso feature selection with respect to BDRICE dataset Comparison with standard dataset
We also used a standard dataset VNRICE to measure our system efficiency. Several research works have been conducted on this dataset [19, 27, 41] for paddy variety identification. Duong et al. [19] applied HOG and feature selection techniques to achieve better accuracy. Nguyen et al. [41] worked on the same dataset and tried to classify paddy variety by utilizing HOG and missing value imputation technique. Hoai et al. [27] applied different deep learning techniques to determine the paddy type and they achieved the highest accuracy for DenseNet121. We compared our system accuracy with the above studies, where we used our best model comprised with Lasso_NM (Table 5) features trained with the FNN model. The comparison result is shown in Table 7. From this table, we observe that our method is superior to other studies applied on the VNRICE dataset. From Table 5, we observed that T20-HOG features achieved better performance than traditional HOG, which is why our system shows better accuracy than Nguyen et al. [41].
Table 7 Accuracy comparison of different methods applied on VNRICE dataset Comparison with recent works
To evaluate the comparative performance, we considered a few prominent recent works in paddy classification. We applied their techniques in our BDRICE dataset and compared them with our results. At first, we considered Ansari et al. [7] work where they trained an SVM model by extracting twenty features: seven color, nine morphological, and four textural features. They converted input RGB image to HSV, and then applied imfill and bwareaopen for removing small objects under 100 pixels. We applied their technique in our dataset and found accuracy 78.21% which is presented in Table 8. However, it is far less than our proposed method’s accuracy.
We also considered Javanmardi et al. [30] study where they used VGG16 convolutional neural network (CNN) architecture for feature extraction from corn seeds. They extracted corn seed features using the VGG16 CNN model and these features are feed into the ANN model. We applied their technique in our BDRICE dataset where the configurations were Image size: \(224 \,\times \, 224 \,\times \, 3\), depth: 16, optimizer: RMSprop, loss function: cross-entropy, max epochs: 100, batch size: 32, learning rate: 0.01, parameters: 138 M. The accuracy of this experiment is 99.36% which is identical to our proposed model. Finally, we can say that our method is effective and suitable for paddy seed classification.
Table 8 Accuracy comparison with recent prominent works according to BDRICE dataset In summary, we can say with conviction that feature selection significantly improves performance. Interclass similarity and intraclass variability are clearly visible in our features selection. The system performance is almost stable when we changed the number of paddy varieties as well as the training and testing ratio. The textural features are efficient in identifying paddy seeds’ variety. Our brand new T20-HOG feature has a notable impact on overall performance in comparison to the traditional HOG, while the combination of T20-HOG and Haralick features boost the system performance. We also applied our system to the VNRICE dataset and compared it with the existing works and observed that our system outperformed all the existing systems which indicates the robustness of our new system.