Abstract
The use of data mining in agricultural production is gaining popularity. The results of the implementation of machine learning methods, namely, decision tree, support vector machine and the K-nearest neighbor for solving the problem of wheat seeds classification by yield properties, using bioelectric indicators of seeds are for the first time presented in the work. The effectiveness of the studied classifiers is presented by the accuracy indicators, the confusion matrix construction and training quality cross validation. The methods comparison results found that the decision tree method showed the best results in data classification. The method is quite simple in the model results understanding and interpretation and does not require additional data preparation. The experimental results showed relatively high accuracy (96%) for the sample with a noise component. There is no need to normalize data, add dummy variables or delete missed data. The K-nearest neighbor is also recommended for classifying seeds by yield properties. However, it is inferior in accuracy to decision trees. For sampling with noise the accuracy was 91%. The support vector machine is not a promising tool for solving this problem, although it is an extremely successful method for other areas.
Similar content being viewed by others
REFERENCES
Kalke, H. and Loewen, M., Support vector machine learning applied to digital images of river ice conditions, Cold Reg. Sci. Technol., 2018, vol. 155, pp. 225–236.
Skvortsov, E.A., Nabokov, V.I., Nekrasov, K.V., Skvortsova, E.G., and Krotov, M.I., Application of technologies of artificial intelligence in agriculture, Agrar. Vestn. Urala, 2019, vol. 187, no. 8.
Chase, T. and Rothley, K.D., Hierarchical tree classifiers to find suitable sites for sandplain grasslands and heathlands on Martha’s Vineyard Island, Massachusetts, Biol. Conserv., 2007, vol. 136, pp. 65–75.
Steele, B.M., Combining multiple classifiers: An application using spatial and remotely sensed information for land cover mapping, Remote Sens. Environ., 2000, vol. 74, pp. 545–556.
Raevsky, B.V. and Tarasenko, V.V., Investigation of the dynamics of forests in the Karelian part of the Green belt of Fennoscandia by remote sensing, Tr. Karel. Nauchn. Tsentra Ross. Akad. Nauk, 2019, pp. 89–99.
Caley, P. and Kuhnert, P.M., Application and evaluation of classification trees for screening unwanted plants, Aust. Ecol., 2006, vol. 31, pp. 647–655.
Recknagel, F., Applications of machine learning to ecological modeling, Ecol. Modell., 2001, vol. 146, pp. 303–310.
Bertsimas, D. and Dunn, J., Optimal classification trees, Mach. Learn., 2017, vol. 106, pp. 1039–1082. https://doi.org/10.1007/s10994-017-5633-9
Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., Hijmans, R.J., Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A., Manion, G., Moritz, C., Nakamura, M., et al., Novel methods improve prediction of species’ distributions from occurrence data, Ecography, 2006, vol. 29, pp. 129–151.
Norouzi, M., Collins, M.D., Johnson, M.A., Fleet, D.J., and Kohli, P., Efficient non-greedy optimization of decision trees, Annual Conference on Neural Information Processing Systems, 2015, Montreal, pp. 1729–1737.
Donskikh, A.O., Minakov, D.A., Sirota, A.A., and Shulgin, V.A., Methods of classification of grain mixtures components based on spectral analysis in visible and infrared wavelength ranges, Vestn. Voronezh.Gos. Univ.: Ser. Sist. Anal. Inf. Tekhnol., 2016, vol. 1, pp. 150–160.
Shamanin, V.P., Petukhovsky, S.L., and Krasnova, Yu.S., The cluster analysis of grades of the soft spring-sown wheat on elements of the crop structure in the southern forest-steppe of Western Siberia, Byull. Krasnoyarsk. Gos. Univ., 2016, no. 4.
Barysheva, N.N. and Pronin, S.P., Method of determining seed germination by using membrane potential of wheat seeds, Inzh. Tekhnol. Sist., 2019, vol. 29, no. 3, pp. 443–455.
Kampichler, C., Wieland, R., Calme, S., Weissenberger, H., and Arriaga-Weiss, S., Classification in conservation biology: A comparison of five machine-learning methods, Ecol. Inf., 2010, vol. 5, pp. 441–450. https://doi.org/10.1016/j.ecoinf.2010.06.003
Mehne, S.H.H. and Mirjalili, S., Support vector machine: Applications and improvements using evolutionary algorithms, in Evolutionary Machine Learning Techniques. Algorithms for Intelligent Systems, Mirjalili, S., Faris, H., and Aljarah, I., Eds., Singapore: Springer, 2020.
Schölkopf, B. and Smola, A., Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, Cambridge, MA: MIT Press, 2002.
Razzaghi, T., Roderick, O., Safro, I., and Marko, N., Multilevel weighted support vector machine for classification on healthcare data with missing values, PloS ONE, 2016, vol. 11, no. 5. https://doi.org/10.1371/journal.pone.0155119
Weinberger, K.Q. and Saul, L.K., Distance metric learning for large margin nearest neighbor classification, J. Mach. Learn. Res., 2009, vol. 10, pp. 207–244.
Zhang, Z., Introduction to machine learning: k-nearest neighbors, Ann. Transl. Med., 2016, vol. 4, no. 11, p. 218. https://doi.org/10.21037/atm.2016.03.37
Ruuska, S., Hämäläinen, W., Kajava, S., Mughal, M., Matilainen, P., and Mononen, J., Evaluation of the confusion matrix method in the validation of an automated system for measuring feeding behaviour of cattle, Behav. Process., 2018, vol. 148, pp. 56–62. https://doi.org/10.1016/j.beproc.2018.01.004
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by D.D. Baryshev, N.N. Barysheva, S.P. Pronin, and O.N. Nikol’skii. The first draft of the manuscript was written by N.N. Barysheva and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
The authors declare that they have no conflict of interest. This article does not contain any studies involving animals or human participants performed by any of the authors.
About this article
Cite this article
Baryshev, D.D., Barysheva, N.N., Pronin, S.P. et al. Comparison of Machine Learning Methods for Solving the Problem of Wheat Seeds Classification by Yield Properties. Russ. Agricult. Sci. 46, 410–417 (2020). https://doi.org/10.3103/S1068367420040047
Received:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S1068367420040047