Abstract
Reliable yield estimation is crucial for food security and agricultural production especially in the intensively agricultural region. This study constructed a gridded yield estimation framework by driving machine learning models with remote sensing vegetation index and meteorological forcing. Among eight machine learning methods, support vector machine (SVM), k-nearest neighbor regression (KNN), and Gaussian process regression (GPR) models outperformed the others. Precipitation, temperature, and the fraction of photosynthetically active radiation are key factors for yield estimation. The yield estimation at county level and regional level were further conducted to explore the scale effect (estimation accuracy varies with spatial resolution). Different scales hold diverse spatial variability information. Finer scales that are more representative of spatial variability generally result in the better accuracy. This study demonstrates that a more accurate yield estimation can be achieved at a finer grid level, thus providing guidelines for agricultural planting structure.
Similar content being viewed by others
Data availability
The data used in this research will be available (by the corresponding author), upon reasonable request.
Code availability
The study primarily used the following Python packages: sklearn and matplotlib. The code will be available upon reasonable request.
References
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66. https://doi.org/10.1007/BF00153759
Alvarez R (2009) Predicting average regional yield and production of wheat in the Argentine Pampas by an artificial neural network approach. Eur J Agron 30:70–77. https://doi.org/10.1016/j.eja.2008.07.005
Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79. https://doi.org/10.1214/09-ss054
Atkinson PM, Tate NJ (2000) Spatial scale problems and geostatistical solutions: A review. Prof Geogr 52:607–623. https://doi.org/10.1111/0033-0124.00250
Balaghi R, Tychon B, Eerens H, Jlibene M (2008) Empirical regression models using NDVI, rainfall and temperature data for the early prediction of wheat grain yields in Morocco. Int J Appl Earth Obs Geoinf 10:438–452. https://doi.org/10.1016/j.jag.2006.12.001
Becker-Reshef I, Vermote E, Lindeman M, Justice C (2010) A generalized regression-based model for forecasting winter wheat yields in Kansas and Ukraine using MODIS data. Remote Sens Environ 114:1312–1323. https://doi.org/10.1016/j.rse.2010.01.010
Bolton DK, Friedl MA (2013) Forecasting crop yield using remotely sensed vegetation indices and crop phenology metrics. Agric for Meteorol 173:74–84. https://doi.org/10.1016/j.agrformet.2013.01.007
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Cai YP, Guan KY, Lobell D, Potgieter AB, Wang SW, Peng J, Xu TF, Asseng S, Zhang YG, You LZ, Peng B (2019) Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches. Agric for Meteorol 274:144–159. https://doi.org/10.1016/j.agrformet.2019.03.010
Chen L, Gao Y, Zhu D, Yuan YH, Liu Y (2019) Quantifying the scale effect in geospatial big data using semi-variograms. PLoS ONE 14:e225139. https://doi.org/10.1371/journal.pone.0225139
Didan K (2015) MOD13C2 MODIS/Terra vegetation indices monthly L3 global 0.05deg CMG V006. NASA EOSDIS land processes DAAC. https://doi.org/10.5067/MODIS/MOD13C2.006
Ergezinger S, Thomsen E (1995) An accelerated learning algorithm for multilayer perceptrons: optimization layer by layer. IEEE Trans Neural Netw 6:31–42. https://doi.org/10.1109/72.363452
Franz TE, Pokal S, Gibson JP, Zhou YZ, Gholizadeh H, Tenorio FA, Rudnick D, Heeren D, McCabe M, Ziliani M, Jin ZN, Guan KY, Pan M, Gates J, Wardlow B (2020) The role of topography, soil, and remotely sensed vegetation condition towards predicting crop yield. F Crop Res 252:107788. https://doi.org/10.1016/j.fcr.2020.107788
Friedl M, Sulla-Menashe D (2019) MCD12Q1 MODIS/Terra+Aqua land cover type yearly L3 global 500m sin grid V006. NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/MODIS/MCD12Q1.006
Gopika N, Kowshalaya ME AM (2018) Correlation based feature selection algorithm for machine learning. Proc 3rd Int Conf Commun Electron Syst (ICCES), pp 692–695. https://doi.org/10.1109/cesys.2018.8723980
Gunn SR (1998) Support vector machines for classification and regression. ISIS Tech Rep 14:5–16
Guo WW, Xue H (2012) An incorporative statistic and neural approach for crop yield modelling and forecasting. Neural Comput Appl 21:109–117. https://doi.org/10.1007/s00521-011-0636-0
Han JC, Zhang Z, Cao J, Luo YC, Zhang LL, Li ZY, Zhang J (2020) Prediction of winter wheat yield based on multi-source data and machine learning in China. Remote Sens 12:236. https://doi.org/10.3390/rs12020236
Haworth J, Cheng T (2012) Non-parametric regression for space–time forecasting under missing data. Comput Environ Urban Syst 36:538–550. https://doi.org/10.1016/j.compenvurbsys.2012.08.005
Holzman ME, Rivas R, Piccolo MC (2014) Estimating soil moisture and the relationship with crop yield using surface temperature and vegetation index. Int J Appl Earth Obs Geoinf 28:181–192. https://doi.org/10.1016/j.jag.2013.12.006
Jaafar HH, Ahmad FA (2015) Crop yield prediction from remotely sensed vegetation indices and primary productivity in arid and semi-arid lands. Int J Remote Sens 36:4570–4589. https://doi.org/10.1080/01431161.2015.1084434
Jelinski DE, Wu JG (1996) The modifiable areal unit problem and implications for landscape ecology. Landsc Ecol 11:129–140. https://doi.org/10.1007/bf02447512
Ji B, Sun Y, Yang S, Wan J (2007) Artificial neural networks for rice yield prediction in mountainous regions. J Agric Sci 145:249–261. https://doi.org/10.1017/s0021859606006691
Johnson MD, Hsieh WW, Cannon AJ, Davidson A, Bédard F (2016) Crop yield forecasting on the Canadian Prairies by remotely sensed vegetation indices and machine learning methods. Agric for Meteorol 218–219:74–84. https://doi.org/10.1016/j.agrformet.2015.11.003
Kaul M, Hill RL, Walthall C (2005) Artificial neural networks for corn and soybean yield prediction. Agric Syst 85:1–18. https://doi.org/10.1016/j.agsy.2004.07.009
Li AN, Liang SL, Wang AS, Qin J (2007) Estimating crop yield from multi-temporal satellite data using multivariate regression and neural network techniques. Photogramm Eng Remote Sens 73:1149–1157. https://doi.org/10.14358/pers.73.10.1149
Li G, Fang S, Ma JX (2020) Modeling merging acceleration and deceleration behavior based on gradient-boosting decision tree. J Transp Eng A-Syst 146:05020005. https://doi.org/10.1061/jtepbs.0000386
Liu YQ, Song W, Deng XZ (2016) Changes in crop type distribution in Zhangye City of the Heihe River Basin, China. Appl Geogr 76:22–36. https://doi.org/10.1016/j.apgeog.2016.09.009
Liu YQ, Song W, Deng XZ (2017) Spatiotemporal patterns of crop irrigation water requirements in the Heihe River Basin, China. Water 9:616. https://doi.org/10.3390/w9080616
Lobell DB, Burke MB (2010) On the use of statistical models to predict crop yield responses to climate change. Agric for Meteorol 150:1443–1452. https://doi.org/10.1016/j.agrformet.2010.07.008
Mkhabela MS, Bullock P, Raj S, Wang S, Yang Y (2011) Crop yield forecasting on the Canadian Prairies using MODIS NDVI data. Agric for Meteorol 151:385–393. https://doi.org/10.1016/j.agrformet.2010.11.012
Mollafilabi A, Davari K, Dehaghi MA (2020) Saffron yield and quality as influenced by different irrigation methods. Sci Agric 78:1–7. https://doi.org/10.1590/1678-992x-2019-0084
Myneni R, Knyazikhin Y, Park T (2015) MOD15A2H MODIS/Terra leaf area index/FPAR 8-day L4 global 500m sin grid V006. NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/MODIS/MOD15A2H.006
Netrapalli P (2019) Stochastic gradient descent and its variants in machine learning. J Indian Inst Sci 99:201–213. https://doi.org/10.1007/s41745-019-0098-4
Niu J, Liu Q, Kang SZ, Zhang XT (2018) The response of crop water productivity to climatic variation in the upper-middle reaches of the Heihe River basin, Northwest China. J Hydrol 563:909–926. https://doi.org/10.1016/j.jhydrol.2018.06.062
Portmann FT, Siebert S, Döll P (2010) MIRCA2000 – Global monthly irrigated and rainfed crop areas around the year 2000: A new high-resolution data set for agricultural and hydrological modeling. Global Biogeochem Cycles 24:GB1011. https://doi.org/10.1029/2008GB003435
Prasad AK, Chai L, Singh RP, Kafatos M (2006) Crop yield estimation model for Iowa using remote sensing and surface parameters. Int J Appl Earth Obs Geoinf 8:26–33. https://doi.org/10.1016/j.jag.2005.06.002
Rasmussen CE (2004) Gaussian processes in machine learning. In: Advanced lectures on machine learning. Springer, Berlin. https://doi.org/10.1007/978-3-540-28650-9_4
Rauff KO, Bello R (2015) A review of crop growth simulation models as tools for agricultural meteorology. Agric Sci 6:1098–1105. https://doi.org/10.4236/as.2015.69105
Salmerón M, Urrego YF, Isla R, Cavero J (2012) Effect of non-uniform sprinkler irrigation and plant density on simulated maize yield. Agric Water Manag 113:1–9. https://doi.org/10.1016/j.agwat.2012.06.007
Schut AGT, Stephens DJ, Stovold RGH, Adams M, Craig RL (2009) Improved wheat yield and production forecasting with a moisture stress index, AVHRR and MODIS data. Crop Pasture Sci 60:60–70. https://doi.org/10.1071/CP08182
Seeger M (2004) Gaussian processes for machine learning. Int J Neural Syst 14:69–106. https://doi.org/10.1142/S0129065704001899
Shelia V, Hansen J, Sharda V, Porter C, Aggarwal P, Wilkerson CJ, Hoogenboom G (2019) A multi-scale and multi-model gridded framework for forecasting crop production, risk analysis, and climate change impact studies. Environ Modell Softw 115:144–154. https://doi.org/10.1016/j.envsoft.2019.02.006
Shi WJ, Tao FL, Zhang Z (2013) A review on statistical models for identifying climate contributions to crop yields. J Geogr Sci 23:567–576. https://doi.org/10.1007/s11442-013-1029-3
Siebert S, Webber H, Rezaei EE (2017) Weather impacts on crop yields - searching for simple answers to a complex problem. Environ Res Lett 12:081001. https://doi.org/10.1088/1748-9326/aa7f15
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222. https://doi.org/10.1023/B:STCO.0000035301.49549.88
Specht DF (1991) A general regression neural network. IEEE Trans Neural Netw 2:568–576. https://doi.org/10.1109/72.97934
Tao FL, Yokozawa M, Liu JY, Zhang Z (2008) Climate–crop yield relationships at provincial scales in China and the impacts of recent climate trends. Clim Res 38:83–94. https://doi.org/10.3354/cr00771
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
van Diepen CA, Wolf J, van Keulen H, Rappoldt C (1989) WOFOST: A simulation model of crop production. Soil Use Manag 5:16–24. https://doi.org/10.1111/j.1475-2743.1989.tb00755.x
Virgolin M, Alderliesten T, Bosman PAN (2020) On explaining machine learning models by evolving crucial and compact features. Swarm Evol Comput 53:100640. https://doi.org/10.1016/j.swevo.2019.100640
Wardlow BD, Egbert SL (2008) Large-area crop mapping using time-series MODIS 250 m NDVI data: An assessment for the U.S. Central Great Plains. Remote Sens Environ 112:1096–1116. https://doi.org/10.1016/j.rse.2007.07.019
Williams JJ, Kim J, Rafferty A, Maldonado S, Gajos KZ, Lasecki WS, Heffernan N (2016) AXIS: Generating explanations at scale with learnersourcing and machine learning. In Proceedings of the Third (2016) ACM Conference on Learning. Association for Computing Machinery: New York, pp 379–388. https://doi.org/10.1145/2876034.2876042
Williams JR, Jones CA, Dyke PT (1984) A modelling approach to determining the relationship between erosion and soil productivity. Trans ASAE 27:129–44. https://doi.org/10.13031/2013.32748
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: Practical machine learning tools and techniques, 4th edn. Morgan Kaufmann, San Francisco. https://doi.org/10.5555/3086818
Wu BF, Zeng Y, Huang JL (2004) Overview of LAI/FPAR retrieval from remotely sensed data. Adv Earth Sci 19:585–590. https://doi.org/10.3321/j.issn:1001-8166.2004.04.015
Yang AX, Zhong B (2016) HiWATER: Land cover map of the Heihe River Basin. National Tibetan Plateau Data Center.https://doi.org/10.3972/hiwater.155.2014.db
Yang K, He J (2019) China meteorological forcing dataset (1979–2018). National Tibetan Plateau Data Center. https://doi.org/10.11888/AtmosphericPhysics.tpe.249369.file
Zampieri M, Ceglar A, Dentener F, Toreti A (2017) Wheat yield loss attributable to heat waves, drought and water excess at the global, national and subnational scales. Environ Res Lett 12:064008. https://doi.org/10.1088/1748-9326/aa723b
Zhang J, Zhang Z, Tao FL (2017) Performance of temperature-related weather index for agricultural insurance of three main crops in China. Int J Disaster Risk Sci 8:78–90. https://doi.org/10.1007/s13753-017-0115-z
Zhang TB, Ji XX, Zhan XY, Ding YT, Zou YF, Kisekka I, Chau HW, Hao F (2021) Maize is stressed by salt rather than water under drip irrigation with soil matric potential higher than −50 kPa in an arid saline area. J Agro Crop Sci 207:654–668. https://doi.org/10.1111/jac.12497
Zhang YR, Haghani A (2015) A gradient boosting method to improve travel time prediction. Transp Res Pt C-Emerg Technol 58:308–324. https://doi.org/10.1016/j.trc.2015.02.019
Zhangye (2020) Zhangye Statistical Yearbooks. Zhangye Statistical Bureau. http://www.zhangye.gov.cn/tjj/ztzl/tjsj/. Accessed 22 Dec 2020
Acknowledgements
The authors thank the Zhangye Statistical Bureau and Bayannur City Bureau of Agriculture and Animal Husbandry for providing the yield data. The authors also thank all the contributors for CMFD, MODIS and MIRCA datasets. The authors thank the reviewers for their constructive comments and useful suggestions on earlier versions of this manuscript.
Funding
This work was financially supported by the National Natural Science Foundation of China (51679233) and the Special Project on National Science and Technology Basic Resources Investigation of China (2021FY100703).
Author information
Authors and Affiliations
Contributions
Conceptualization, Jun Niu and Dehai Liao; Data curation, Dehai Liao and Na Lu; Formal analysis, Dehai Liao and Qianxi Shen; Funding acquisition, Jun Niu; Investigation, Qianxi Shen; Methodology, Jun Niu and Dehai Liao; Supervision, Jun Niu; Validation, Dehai Liao and Jun Niu; Visualization, Na Lu; Writing—original draft, Dehai Liao; Writing—review & editing, Jun Niu. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
All authors consent to participate into the study.
Consent for publication
All authors consent to publish the study in a journal article.
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Liao, D., Niu, J., Lu, N. et al. Towards crop yield estimation at a finer spatial resolution using machine learning methods over agricultural regions. Theor Appl Climatol 146, 1387–1401 (2021). https://doi.org/10.1007/s00704-021-03799-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00704-021-03799-3