Abstract
A current challenge of genetic breeding programs is to increase grain yield and protein content and at least maintain oil content. However, evaluations of industrial traits are time and cost-consuming. Thus, achieving accurate models for classifying genotypes with better industrial technological performance based on easier and faster to measure traits, such as agronomic ones, is of paramount importance for soybean breeding programs. The objective was to classify groups of soybean genotypes to industrial technological variables based on agronomic traits measured in the field using machine learning (ML) techniques. Field experiments were carried out in two sites in a randomized block design with two replications and 206 F2 soybean populations. Agronomic traits evaluated were: days to maturation (DM), first pod height (FPH), plant height (PH), number of branches (NB), main stem diameter (SD), mass of one hundred grains (MHG), and grain yield (GY). Industrial technological variables evaluated were oil yield, crude protein, crude fiber, and ash contents, determined by high-optical accuracy near-infrared spectroscopy (NIRS). The models tested were: support vector machine (SVM), artificial neural network (ANN), decision tree models J48 and REPTree, random forest (RF), and logistic regression (LR, used as control). A genotype clustering was performed using PCA and k-means algorithm, and then the clusters formed were used as output variables of the ML models, while the agronomic traits were used as input variables. ML techniques provided accurate models to classify soybean genotypes for more complex variables (industrial technological) based on agronomic traits. RF outperformed the other models and can be used to contribute to soybean breeding programs by classifying genotypes for industrial technological traits.
Similar content being viewed by others
Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
References
Alaswad AA, Song B, Oehrle NW, Wiebold WJ, Mawhinney TP, Krishnan HB (2021) Development of soybean experimental lines with enhanced protein and sulfur amino acid content. Plant Sci 308:110912. https://doi.org/10.1016/j.plantsci.2021.110912
André Cremonez P, Feroldi M, Cézar Nadaleti W, De Rossi E, Feiden A, De Camargo MP, Cremonez FE, Klajn FF (2015) Biodiesel production in Brazil: current scenario and perspectives. Renew Sustain Energy Rev 42:415–428. https://doi.org/10.1016/j.rser.2014.10.004
Bandillo N, Jarquin D, Song Q, Nelson R, Cregan P, Specht J, Lorenz A (2015) A population structure and genome-wide association analysis on the usda soybean germplasm collection. Plant Genome. https://doi.org/10.3835/plantgenome2015.04.0024
Barnes RJ, Dhanoa MS, Lister SJ (1989) Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Appl Spectrosc 43:772–777. https://doi.org/10.1366/0003702894202201
Batista TS, Teodoro LPR, Azevedo GB, de Azevedo GTDOS, Poersch NL, Borges MVV, Teodoro PE (2022) Artificial neural networks and non-linear regression for quantifying the wood volume in eucalyptus species. South For J For Sci. 84:1–7. https://doi.org/10.2989/20702620.2021.1976604
Belgiu M, Drăgu L (2016) Random forest in remote sensing: a review of applications and future directions. ISPRS J Photogramm Remote Sens 114:24–31. https://doi.org/10.1016/j.isprsjprs.2016.01.011
Breiman L (2019) Random forests. Random for. 1–122. https://doi.org/10.1201/9780429469275-8
Burton JW (1985) No titlworld soybean research conference III: Proceedingse, 1st editio. ed. Boca Raton. https://doi.org/10.1201/9780429267932
Cober ER, Voldeng HD (2000) Cs-40–1–39 (1) 1994–1997
Cornelissen W, Loureiro M (2020) Automatic onset detection using convolutional neural networks 199–200. https://doi.org/10.5753/sbcm.2019.10446
Egmont-Petersen M, De Ridder D, Handels H (2002) Image processing with neural networks–a review. Pattern Recognit 35:2279–2301. https://doi.org/10.1016/S0031-3203(01)00178-9
Fletcher RS, Reddy KN (2016) Random forest and leaf multispectral reflectance data to differentiate three soybean varieties from two pigweeds. Comput Electron Agric 128:199–206. https://doi.org/10.1016/j.compag.2016.09.004
Goldsmith PD (2008) Economics of soybean production, marketing, and utilization. Soybeans Chem Prod Process. https://doi.org/10.1016/B978-1-893997-64-6.50008-1
Hongyu K, Jorge G, Junior DO (2015) Análise de Componentes Principais : resumo teórico aplicação e interpretação principal component analysis : theory interpretations and applications. E&S Eng Sci 1:83–90. https://doi.org/10.18607/ES20165053
Horwitz W, Chichilo P, Reynolds H (1970) Official methods of analysis of the Association of Official Analytical Chemists, Washington, DC, USA: Association of Official Analytical Chemists
Kalmegh S (2015) Analysis of WEKA data mining algorithm REPTree, simple cart and randomtree for classification of indian news. Int J Innov Sci Eng Technol 2:438–446
Kambhampati S, Aznar-Moreno JA, Hostetler C, Caso T, Bailey SR, Hubbard AH, Durrett TP, Allen DK (2020) On the inverse correlation of protein and oil: examining the effects of altered central carbon metabolism on seed composition using soybean fast neutron mutants. Metabolites 10:1–15. https://doi.org/10.3390/metabo10010018
Lee S, Van K, Sung M, Nelson R, LaMantia J, McHale LK, Mian MAR (2019) Genome-wide association study of seed protein, oil and amino acid contents in soybean from maturity groups I–IV. Theor Appl Genet 132:1639–1659. https://doi.org/10.1007/s00122-019-03304-5
Marques Ramos AP, Prado Osco L, Elis Garcia Furuya D, Nunes Gonçalves W, Cordeiro Santana D, Pereira Ribeiro Teodoro L, da Silva Antonio, Junior C, Fernando Capristo-Silva G, Li J, Henrique Rojo Baio F, Marcato Junior J, Eduardo Teodoro P, Pistori H (2020) A random forest ranking approach to predict yield in maize with uav-based vegetation spectral indices. Comput Electron Agric 178:105791. https://doi.org/10.1016/j.compag.2020.105791
Pipolo EA, Hungria M, Franchinio JC, Junior AAB, Debiasi H, Mandarino JMG, (2015) Comunicado técnico 86: teores de óleo e proteína em soja: fatores envolvidos e qualidade para a indústria. In Portuguese 1–15
R Development Core Team (2014) R: a language and environment for statistical computing
Rajvanshi N, Chowdhary KR (2017) Comparison of SVM and naïve bayes text classification algorithms using WEKA. Int J Eng Res. https://doi.org/10.17577/ijertv6is090084
Ramos LP, Kothe V, César-oliveira MAF, Nakagaki S, Krieger N, Wypych F, Cordeiro CS (2017) Artigo biodiesel : matérias-primas , tecnologias de produção e propriedades combustíveis biodiesel : matérias-primas , tecnologias de produção e propriedades combustíveis. https://doi.org/10.21577/1984-6835.20170020
Santana DC, Teodoro LPR, Baio FHR, dos Santos RG, Coradi PC, Biduski B, Shiratsuchi LS (2023) Classification of soybean genotypes for industrial traits using UAV multispectral imagery and machine learning. Remote Sens Appl Soc Environ 29:100919
Santos et al., (2018) Sistema brasileiro de classificação de solos, Embrapa Solos
Schwalbert RA, Amado T, Corassa G, Pott LP, Prasad PVV, Ciampitti IA (2020) Satellite-based soybean yield forecast: Integrating machine learning and weather data for improving crop yield prediction in southern Brazil. Agric for Meteorol 284:107886. https://doi.org/10.1016/j.agrformet.2019.107886
Singh A, Ganapathysubramanian B, Singh AK, Sarkar S (2016) Machine learning for high-throughput stress phenotyping in plants. Trends Plant Sci 21:110–124. https://doi.org/10.1016/j.tplants.2015.10.015
Snousy MBA, El-Deeb HM, Badran K, Khlil IAA (2011) Suite of decision tree-based classification algorithms on cancer gene expression data. Egypt Informatics J 12:73–82. https://doi.org/10.1016/j.eij.2011.04.003
Sousa, DMG, Lobato E (2017) Cerrado–Correção do solo e adubação
Teodoro PE, Teodoro LPR, Baio FHR, da Silva Junior CA, Dos Santos RG, Ramos APM, Pinheiro MMF, Osco LP, Gonçalves WN, Carneiro AM, Marcato Junior J, Pistori H, Shiratsuchi LS (2021) Predicting days to maturity, plant height, and grain yield in soybean: a machine and deep learning approach using multispectral data. Remote Sens. https://doi.org/10.3390/rs13224632
van Dijk ADJ, Kootstra G, Kruijer W, de Ridder D (2021) Machine learning in plant science and plant breeding. Iscience 24:101890. https://doi.org/10.1016/j.isci.2020.101890
Zhou J, Zhou J, Ye H, Ali ML, Nguyen HT, Chen P (2020) Classification of soybean leaf wilting due to drought stress using UAV-based imagery. Comput Electron Agric 175:105576. https://doi.org/10.1016/j.compag.2020.105576
Acknowledgements
The authors would like to thank the Universidade Federal de Mato Grosso do Sul (UFMS), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) – Grant numbers 303767/2020-0, and 304979/2022-8, and Fundação de Apoio ao Desenvolvimento do Ensino, Ciência e Tecnologia do Estado de Mato Grosso do Sul (FUNDECT) TO numbers 88/2021, 07/2022, 318/2022 and 94/2023, and SIAFEM numbers 30478, 31333, 32242 and 33111. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brazil (CAPES) – Financial Code 001.
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Contributions
L.P.R.T., B.B., F.E.T. and P.E.T. collected the data. L.P.R.T., M.O.S., P.E.T., and P.C.C. produced a draft of the manuscript. L.P.R.T., P.E.T., and M.O.S. performed all statistical analyses. C.A.S.J. and F.E.T. contributed with a critical review of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Teodoro, L.P.R., Silva, M.O., dos Santos, R.G. et al. Machine learning for classification of soybean populations for industrial technological variables based on agronomic traits. Euphytica 220, 40 (2024). https://doi.org/10.1007/s10681-024-03301-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10681-024-03301-w