Abstract
Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the “best” data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression—MLR; random forests—RF; and support vector machines—SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years × 150 “high-quality” counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha–1 representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.
Similar content being viewed by others
References
Battisti, R., & Sentelhas, P. C. (2015). Drought tolerance of brazilian soybean cultivars simulated by a simple agrometeorological yield model. Experimental Agriculture, 51, 285–298. https://doi.org/10.1017/S0014479714000283
Battisti, R., Sentelhas, P. C., & Boote, K. J. (2017). Inter-comparison of performance of soybean crop simulation models and their ensemble in southern Brazil. Field Crops Research, 200, 28–37. https://doi.org/10.1016/j.fcr.2016.10.004
Battisti, R., Sentelhas, P. C., Pascoalino, J. A. L., Sako, H., de Sá Dantas, J. P., & Moraes, M. F. (2018). Soybean yield gap in the areas of yield contest in Brazil. International Journal of Plant Production, 12, 159–168. https://doi.org/10.1007/s42106-018-0016-0
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Carauta, M., Libera, A.A.D., Hampf, A., Chen, R.F.F., Silveira, J.M.F.J., Berger, T. (2017). On-farm trade-offs for optimal agricultural practices in Mato Grosso, Brazil. Revista de Economia e Agronegócio. https://doi.org/10.25070/rea.v15i3.505
Cassman, K. G., & Grassini, P. (2020). A global perspective on sustainable intensification research. Nature Sustainability, 3, 262–268. https://doi.org/10.1038/s41893-020-0507-8
Conab. (2021). Brazilian Food Supply Company.
Cooper, M., Mendes, L. M. S., Silva, W. L. C., & Sparovek, G. (2005). A national soil profile database for brazil available to international scientists. Soil Science Society of America Journal, 69, 649–652. https://doi.org/10.2136/sssaj2004.0140
Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20, 273–297. https://doi.org/10.1111/j.1747-0285.2009.00840.x
da Silva, E. H. F. M., Gonçalves, A. O., Pereira, R. A., Fattori Júnior, I. M., Sobenko, L. R., & Marin, F. R. (2019). Soybean irrigation requirements and canopy-atmosphere coupling in Southern Brazil. Agricultural Water Management, 218, 1–7. https://doi.org/10.1016/j.agwat.2019.03.003
da Silva, E. H. F. M., Silva Antolin, L. A., Zanon, A. J., Soares Andrade, A., Antunes de Souza, H., dos Santos Carvalho, K., Aparecido Vieira, N., & Marin, F. R. (2021). Impact assessment of soybean yield and water productivity in Brazil due to climate change. European Journal of Agronomy, 129, 126329. https://doi.org/10.1016/j.eja.2021.126329
De Melo, R. W., Fontana, D. C., Berlato, M. A., & Ducati, J. R. (2008). An agrometeorological-spectral model to estimate soybean yield, applied to southern Brazil. International Journal of Remote Sensing, 29, 4013–4028. https://doi.org/10.1080/01431160701881905
de Nóia Júnior, R. S., & Sentelhas, P. C. (2019). Soybean-maize succession in Brazil: Impacts of sowing dates on climate variability, yields and economic profitability. The European Journal of Agronomy., 103, 140–151. https://doi.org/10.1016/j.eja.2018.12.008
Deines, J. M., Patel, R., Liang, S. Z., Dado, W., & Lobell, D. B. (2021). A million kernels of truth: Insights into scalable satellite maize yield mapping and yield gap analysis from an extensive ground dataset in the US Corn Belt. Remote Sensing of Environment, 253, 112174. https://doi.org/10.1016/j.rse.2020.112174
del Vera-Diaz, M. C., Kaufmann, R. K., Nepstad, D. C., & Schlesinger, P. (2008). An interdisciplinary model of soybean yield in the Amazon Basin: The climatic, edaphic, and economic determinants. Ecological Economics., 65, 420–431. https://doi.org/10.1016/j.ecolecon.2007.07.01510.1016/j.ecolecon.2007.07.015
dos Santos VB, dos Santos AMF, da Silva Cabral Dexx Moraes, JR, de Oliveira Vieira IC, de Souza Rolim G (2021). Machine learning algorithms for soybean yield forecasting in the Brazilian Cerrado. Journal of the Science of Food and Agriculturehttps://doi.org/10.1002/jsfa.11713
Ewert, F., Rötter, R. P., Bindi, M., Webber, H., Trnka, M., Kersebaum, K. C., Olesen, J. E., van Ittersum, M. K., Janssen, S., Rivington, M., Semenov, M. A., Wallach, D., Porter, J. R., Stewart, D., Verhagen, J., Gaiser, T., Palosuo, T., Tao, F., Nendel, C., … Asseng, S. (2015). Crop modelling for integrated assessment of risk to food production from climate change. Environmental Modelling and Software, 72, 287–303. https://doi.org/10.1016/j.envsoft.2014.12.003
FAOSTAT. (2021). Food and Agriculture Organization - FAOSTAT.
Fernandes, J. L., Ebecken, N. F. F., & Esquerdo, J. C. D. M. (2017). Sugarcane yield prediction in Brazil using NDVI time series and neural networks ensemble. International Journal of Remote Sensing, 38, 4631–4644. https://doi.org/10.1080/01431161.2017.1325531
Figueiredo, P. N. (2016). New challenges for public research organisations in agricultural innovation in developing economies: Evidence from Embrapa in Brazil’s soybean industry. The Quarterly Review of Economics and Finance., 62, 21–32. https://doi.org/10.1016/j.qref.2016.07.011
Hampf, A. C., Stella, T., Berg-Mohnicke, M., Kawohl, T., Kilian, M., & Nendel, C. (2020). Future yields of double-cropping systems in the Southern Amazon, Brazil, under climate change and technological development. Agricultural Systems. https://doi.org/10.1016/j.agsy.2019.102707
Hatfield, J. L., & Prueger, J. H. (2015). Temperature extremes: Effect on plant growth and development. Weather Clim. Extrem., 10, 4–10. https://doi.org/10.1016/j.wace.2015.08.001
Hatfield, J. L., Sauer, T. J., & Prueger, J. H. (2001). Managing soils to achieve greater water use efficiency: A review. Agronomy Journal, 93, 271–280. https://doi.org/10.2134/agronj2001.932271x
Heinemann, A. B., & Sentelhas, P. C. (2011). Environmental group identification for upland rice production in central Brazil. Science in Agriculture, 68, 540–547. https://doi.org/10.1590/s0103-90162011000500005
Hoffmann, H., Zhao, G., Asseng, S., Bindi, M., Biernath, C., Constantin, J., Coucheney, E., Dechow, R., Doro, L., Eckersten, H., Gaiser, T., Grosz, B., Heinlein, F., Kassie, B. T., Kersebaum, K. C., Klein, C., Kuhnert, M., Lewan, E., Moriondo, M., … Ewert, F. (2016). Impact of spatial soil and climate input data aggregation on regional Yield Simulations. PLoS One, 11, 1–23. https://doi.org/10.1371/journal.pone.0151782
Holzworth, D., Huth, N. I., Fainges, J., Brown, H., Zurcher, E., Cichota, R., Verrall, S., Herrmann, N. I., Zheng, B., & Snow, V. (2018). APSIM Next Generation: Overcoming challenges in modernising a farming systems model. Environmental Modelling and Software, 103, 43–51. https://doi.org/10.1016/j.envsoft.2018.02.002
Jiang, Z., Liu, C., Ganapathysubramanian, B., Hayes, D. J., & Sarkar, S. (2020). Predicting county-scale maize yields with publicly available data. Science and Reports, 10, 1–12. https://doi.org/10.1038/s41598-020-71898-8
Jones, J. W., Antle, J. M., Basso, B., Boote, K. J., Conant, R. T., Foster, I., Godfray, H. C. J., Herrero, M., Howitt, R. E., Janssen, S., Keating, B. A., Munoz-Carpena, R., Porter, C. H., Rosenzweig, C., & Wheeler, T. R. (2017). Brief history of agricultural systems modeling. Agricultural Systems, 155, 240–254. https://doi.org/10.1016/j.agsy.2016.05.014
Jones, J. W., Hoogenboom, G., Porter, C. H., Boote, K. J., Batchelor, W. D., Hunt, L. A., Wilkens, P. W., Singh, U., Gijsman, A. J., & Ritchie, J. T. (2003). The DSSAT cropping system model. European Journal of Agronomy. https://doi.org/10.1016/S1161-0301(02)00107-7
Kang, Y., Ozdogan, M., Zhu, X., Ye, Z., Hain, C., & Anderson, M. (2020). Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest. Environmental Research Letters. https://doi.org/10.1088/1748-9326/ab7df9
Liaw, A., & Wiener, M. (2002). Classification and Regression by random. Forest R News, 2, 18–22.
Lischeid, G., Webber, H., Sommer, M., Nendel, C., & Ewert, F. (2022). Machine learning in crop yield modelling: A powerful tool, but no surrogate for science. Agricultural and Forest Meteorology, 312, 108698. https://doi.org/10.1016/j.agrformet.2021.108698
Lobell, D. B., & Burke, M. B. (2010). On the use of statistical models to predict crop yield responses to climate change. Agricultural and Forest Meteorology, 150, 1443–1452. https://doi.org/10.1016/j.agrformet.2010.07.008
Lobell, D. B., Cassman, K. G., & Field, C. B. (2009). Crop yield gaps: Their importance, magnitudes, and causes. Annual Review of Environment and Resources, 34, 179–204. https://doi.org/10.1146/annurev.environ.041008.093740
Lobell, D. B., Thau, D., Seifert, C., Engle, E., & Little, B. (2015). A scalable satellite-based crop yield mapper. Remote Sensing of Environment, 164, 324–333. https://doi.org/10.1016/j.rse.2015.04.021
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C.-C., & Lin, C.-C. (2021). Package ‘e1071.’
Monteith, J. L. (1977). Climate and the efficiency of crop production in Britain. Philosophical Transactions of the Royal Society B, 281, 277–294. https://doi.org/10.1098/rstb.1977.0140
Nendel, C., Berg, M., Kersebaum, K. C., Mirschel, W., Specka, X., Wegehenkel, M., Wenkel, K. O., & Wieland, R. (2011). The MONICA model: Testing predictability for crop growth, soil moisture and nitrogen dynamics. Ecological Modelling, 222, 1614–1625. https://doi.org/10.1016/j.ecolmodel.2011.02.018
Nendel, C., Kersebaum, K. C., Mirschel, W., & Wenkel, K. O. (2014). Testing farm management options as climate change adaptation strategies using the MONICA model. European Journal of Agronomy, 52, 47–56. https://doi.org/10.1016/j.eja.2012.09.005
Olson, K. R., & Olson, G. W. (1986). Use of multiple regression analysis to estimate average corn yields using selected soils and climatic data. Agricultural Systems, 20, 105–120. https://doi.org/10.1016/0308-521X(86)90062-4
Paudel, D., Boogaard, H., de Wit, A., van der Velde, M., Claverie, M., Nisini, L., Janssen, S., Osinga, S., & Athanasiadis, I. N. (2022). Machine learning for regional crop yield forecasting in Europe. Field Crops Research, 276, 108377. https://doi.org/10.1016/j.fcr.2021.108377
R Core Team. (2020). A language and environment for statistical computing.
Ramirez-Villegas, J., & Challinor, A. (2012). Assessing relevant climate data for agricultural applications. Agricultural and Forest Meteorology, 161, 26–45. https://doi.org/10.1016/j.agrformet.2012.03.015
Rogers, J., Chen, P., Shi, A., Zhang, B., Scaboo, A., Smith, S. F., & Zeng, A. (2015). Agronomic performance and genetic progress of selected historical soybean varieties in the southern USA. Plant Breeding, 134, 85–93. https://doi.org/10.1111/pbr.12222
Schwalbert, R. A., Amado, T., Corassa, G., Pott, L. P., Prasad, P. V. V., & Ciampitti, I. A. (2020). Satellite-based soybean yield forecast: Integrating machine learning and weather data for improving crop yield prediction in southern Brazil. Agricultural and Forest Meteorology, 284, 107886. https://doi.org/10.1016/j.agrformet.2019.107886
Sentelhas, P. C., Battisti, R., Câmara, G. M. S., Farias, J. R. B., Hampf, A. C., & Nendel, C. (2015). The soybean yield gap in Brazil–Magnitude, causes and possible solutions for sustainable production. Journal of Agricultural Science, 153, 1394–1411. https://doi.org/10.1017/S0021859615000313
Shahhosseini, M., Hu, G., Huber, I., & Archontoulis, S. V. (2021). Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt. Science and Reports, 11, 1–15. https://doi.org/10.1038/s41598-020-80820-1
Siebert, S., Kummu, M., Porkka, M., Döll, P., Ramankutty, N., & Scanlon, B. R. (2015). A global data set of the extent of irrigated land from 1900 to 2005. Hydrology and Earth System Sciences, 19, 1521–1545. https://doi.org/10.5194/hess-19-1521-2015
Silva Fuzzo, D. F., Carlson, T. N., Kourgialas, N. N., & Petropoulos, G. P. (2020). Coupling remote sensing with a water balance model for soybean yield predictions over large areas. The Earth Science Informatics, 13, 345–359. https://doi.org/10.1007/s12145-019-00424-w
Steduto, P., Hsiao, T.C., Fereres, E., & Raes, D. (2012). Crop yield response to water.
Umburanas, R. C., Kawakami, J., Ainsworth, E. A., Favarin, J. L., Anderle, L. Z., Dourado-Neto, D., & Reichardt, K. (2022). Changes in soybean cultivars released over the past 50 years in southern Brazil. Science and Reports, 12, 1–14. https://doi.org/10.1038/s41598-021-04043-8
van Bussel, L. G. J., Grassini, P., Van Wart, J., Wolf, J., Claessens, L., Yang, H., Boogaard, H., de Groot, H., Saito, K., Cassman, K. G., & van Ittersum, M. K. (2015). From field to atlas: Upscaling of location-specific yield gap estimates. Field Crops Research, 177, 98–108. https://doi.org/10.1016/j.fcr.2015.03.005
van Klompenburg, T., Kassahun, A., & Catal, C. (2020). Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics in Agriculture, 177, 105709. https://doi.org/10.1016/j.compag.2020.105709
Van Wart, J., Grassini, P., & Cassman, K. G. (2013). Impact of derived global weather data on simulated crop yields. Global Change Biology, 19, 3822–3834. https://doi.org/10.1111/gcb.12302
Wallach, D., Palosuo, T., Thorburn, P., Hochman, Z., Gourdain, E., Andrianasolo, F., Asseng, S., Basso, B., Buis, S., Crout, N., Dibari, C., Dumont, B., Ferrise, R., Gaiser, T., Garcia, C., Gayler, S., Ghahramani, A., Hiremath, S., Hoek, S., … Seidel, S. J. (2021). The chaos in calibrating crop models: Lessons learned from a multi-model calibration exercise. Environmental Modelling and Software. https://doi.org/10.1016/j.envsoft.2021.105206
Webber, H., Lischeid, G., Sommer, M., Finger, R., Nendel, C., Gaiser, T., & Ewert, F. (2020). No perfect storm for crop yield failure in Germany. Environmental Research Letters. https://doi.org/10.1088/1748-9326/aba2a4
Zhao, G., Hoffmann, H., Van Bussel, L. G. J., Enders, A., Specka, X., Sosa, C., Yeluripati, J., Tao, F., Constantin, J., Raynal, H., Teixeira, E., Grosz, B., Doro, L., Zhao, Z., Nendel, C., Kiese, R., Eckersten, H., Haas, E., Vanuytrecht, E., Ewert, F. (2015). Effect of weather data aggregation on regional crop simulation for different crops, production conditions, and response variables. Climate Research., 65, 141–157. https://doi.org/10.3354/cr01301
Acknowledgements
The authors would like to thank the funding support by the Sao Paulo Research Foundation (FAPESP), through the Grant numbers: [2014/26767-9 and 2017/08970-0].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that there are no conflicts of interest.
Supplementary Information
Below is the link to the electronic supplementary material.
42106_2022_209_MOESM1_ESM.tif
Figure S1 Relatioship between the out-of-bag error (y-axis) and number of trees (x-axis) when a random sample selection was performed for building the random forest models (TIF 3 kb)
42106_2022_209_MOESM2_ESM.tiff
Figure S2 Geospatial and temporal variability of technological trends in different regions where soybean crop is produced in Brazil. The dashed line represent the detrended yield, while the continuous line is the observed yield reported by IBGE. The grey area between curves represent the impact of technological packages on final yields. The number between square brackets are the technological progress throughout the 23 years assessed (TIFF 1159 kb)
42106_2022_209_MOESM3_ESM.tif
Figure S3 Variability of the rRMSE (%) for 100-fold choice of the calibration and validation subsets for building the data-driven models for predict and estimate soybean yields (TIF 341 kb)
42106_2022_209_MOESM4_ESM.tif
Figure S4 Variability of the R2 for 100-fold choice of the calibration and validation subsets for building the data-driven models for predict and estimate soybean yields (TIF 351 kb)
42106_2022_209_MOESM6_ESM.tiff
Figure S6 Geospatial and temporal variability of precipitation during the simulated soybean cycle (accumulated from November to February) at the “high-quality” counties. Residues were calculated as the relative deviation from the precipitation within crop cycle and its average from 1996-2018 (TIFF 2729 kb)
42106_2022_209_MOESM7_ESM.tiff
Figure S7 Geospatial and temporal variability of maximum air temperature during the simulated soybean cycle (average from November to February) at the “high-quality” counties. Residues were calculated as the deviation between the yearly and average temperatures within the crop cycle from 1996-2018 (TIFF 2702 kb)
42106_2022_209_MOESM8_ESM.tiff
Figure S8 Geospatial and temporal variability of minimum air temperature during the simulated soybean cycle (average from November to February) at the “high-quality” counties. Residues were calculated as the deviation between the yearly and average temperatures within the crop cycle from 1996-2018 (TIFF 2693 kb)
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Monteiro, L.A., Ramos, R.M., Battisti, R. et al. Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil. Int. J. Plant Prod. 16, 691–703 (2022). https://doi.org/10.1007/s42106-022-00209-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42106-022-00209-0