Skip to main content

Advertisement

Log in

Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil

  • Research
  • Published:
International Journal of Plant Production Aims and scope Submit manuscript

Abstract

Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the “best” data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression—MLR; random forests—RF; and support vector machines—SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years × 150 “high-quality” counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha–1 representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

Download references

Acknowledgements

The authors would like to thank the funding support by the Sao Paulo Research Foundation (FAPESP), through the Grant numbers: [2014/26767-9 and 2017/08970-0].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo A. Monteiro.

Ethics declarations

Conflict of interest

We declare that there are no conflicts of interest.

Supplementary Information

Below is the link to the electronic supplementary material.

42106_2022_209_MOESM1_ESM.tif

Figure S1 Relatioship between the out-of-bag error (y-axis) and number of trees (x-axis) when a random sample selection was performed for building the random forest models (TIF 3 kb)

42106_2022_209_MOESM2_ESM.tiff

Figure S2 Geospatial and temporal variability of technological trends in different regions where soybean crop is produced in Brazil. The dashed line represent the detrended yield, while the continuous line is the observed yield reported by IBGE. The grey area between curves represent the impact of technological packages on final yields. The number between square brackets are the technological progress throughout the 23 years assessed (TIFF 1159 kb)

42106_2022_209_MOESM3_ESM.tif

Figure S3 Variability of the rRMSE (%) for 100-fold choice of the calibration and validation subsets for building the data-driven models for predict and estimate soybean yields (TIF 341 kb)

42106_2022_209_MOESM4_ESM.tif

Figure S4 Variability of the R2 for 100-fold choice of the calibration and validation subsets for building the data-driven models for predict and estimate soybean yields (TIF 351 kb)

Figure S5 Performance of the RF model to predict soybean yields at 90 DAS (TIF 1319 kb)

42106_2022_209_MOESM6_ESM.tiff

Figure S6 Geospatial and temporal variability of precipitation during the simulated soybean cycle (accumulated from November to February) at the “high-quality” counties. Residues were calculated as the relative deviation from the precipitation within crop cycle and its average from 1996-2018 (TIFF 2729 kb)

42106_2022_209_MOESM7_ESM.tiff

Figure S7 Geospatial and temporal variability of maximum air temperature during the simulated soybean cycle (average from November to February) at the “high-quality” counties. Residues were calculated as the deviation between the yearly and average temperatures within the crop cycle from 1996-2018 (TIFF 2702 kb)

42106_2022_209_MOESM8_ESM.tiff

Figure S8 Geospatial and temporal variability of minimum air temperature during the simulated soybean cycle (average from November to February) at the “high-quality” counties. Residues were calculated as the deviation between the yearly and average temperatures within the crop cycle from 1996-2018 (TIFF 2693 kb)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Monteiro, L.A., Ramos, R.M., Battisti, R. et al. Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil. Int. J. Plant Prod. 16, 691–703 (2022). https://doi.org/10.1007/s42106-022-00209-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42106-022-00209-0

Keywords

Navigation