Abstract
Eradicating hunger and malnutrition is a key development goal of the twenty first century. This paper addresses the problem of optimally identifying seed varieties to reliably increase crop yield within a risk-sensitive decision making framework. Specifically, a novel hierarchical machine learning mechanism for predicting crop yield (the yield of different seed varieties of the same crop) is introduced. This prediction mechanism is then integrated with a weather forecasting model and three different approaches for decision making under uncertainty to select seed varieties for planting so as to balance yield maximization and risk. The model was applied to the problem of soybean variety selection given in the 2016 Syngenta Crop Challenge. The prediction model achieved a median absolute error of 235 kg/ha and thus provides good estimates for input into the decision models. The decision models identified the selection of soybean varieties that appropriately balance yield and risk as a function of the farmer’s risk aversion level. More generally, the models can support farmers in decision making about which seed varieties to plant.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Australian Center for Precision Agriculture (2010) A general introduction to precision agriculture. http://www.agriprecisione.it/wp-content/uploads/2010/11/general_introduction_to_precision_agriculture.pdf
Bunge J (2014) Big data comes to the farm, sowing mistrust. Wall Str J. https://www.wsj.com/articles/no-headline-available-1393372266?tesla=y
Food and Agriculture Organization of the United Nations (2015) Soybean worldwide production. http://www.fao.org/faostat/en/#data/QC/visualize
Gandhi N, Petkar O, Armstrong LJ (2016) Rice crop yield prediction using artificial neural networks. IEEE Technol Innov ICT Agric Rural Dev 2016:105–110. https://doi.org/10.1109/TIAR.2016.7801222
IBM (2017) IBM ILOG CPLEX optimization studio v12.6.3. https://www.ibm.com/support/knowledgecenter/en/SSSA5P_12.6.2/ilog.odms.studio.help/Optimization_Studio/topics/COS_home.html
Institute for Operations Research and the Management Sciences (2016) Syngenta crop challenge in analytics. https://www.ideaconnection.com/syngenta-crop-challenge/challenge.php
International Food Policy Research Institute (2017) Food security. http://www.ifpri.org/topic/food-security
Knox SW (2018) Machine learning: a concise introduction. Wiley series in probability and statistics. Wiley, Hoboken
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira E, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Kumar R, Singh MP, Kumar P, Singh JP (2015) Crop selection method to maximize crop yield rate using machine learning technique. In: 2015 international conference on smart technologies and management for computing, communication, controls, energy and materials (ICSTM), pp 138–145
Rajak RK, Pawar A, Pendke M, Shinde P, Rathod S, Devare A (2017) Crop recommendation system to maximize crop yield using machine learning technique. Int Res J Eng Technol 4(12):950–953
Sujjaviriyasup T, Pitiruek K (2013) Agricultural product forecasting using machine learning approach. Int J Math Anal 7(38):1869–1875
Syngenta (2016) Crop challenge winners announced. http://www.syngenta-us.com/thrive/news/crop-challenge-winners.html
United Nations (2015) Sustainable development goals. 17 goals to transform our world. http://www.un.org/sustainabledevelopment/sustainable-development-goals/
World Food Programme (2017) Zero hunger. http://www1.wfp.org/zero-hunger
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Appendices
Appendix 1: Algorithm for solving robust optimization model
We use the following heuristic algorithm to efficiently solve the robust optimization model:
Appendix 2: Feature code key
Table 8 provides a detailed description of the 30 features in the Syngenta dataset.
Appendix 3: Cluster results from hierarchical clustering method
We implemented hierarchical clustering with Ward’s minimum variance method (which minimizes total within-cluster variance). To compare with the k-means clustering results, we set the number of clusters to 12 in the hierarchical clustering. Results are shown in Table 9. We found that most of the groups (groups 1, 2, 4, 6, 7, 8, 9, 10, 11, 12) retained nearly the same varieties. The original group 5 in the k-means had only one variety (V54) which became a member of group 4 in the new cluster. The members in the original group 3 were split into the new groups 3 and 5, with the new group 3 retaining most members.
Appendix 4: Machine learning methods for weather prediction
We performed experiments using different machine learning methods to predict temperature, precipitation, and radiation in the next year. We used six values from 2008 to 2013 to predict the value for 2014. We randomly selected 75% of the data as the training data and the remaining 25% as the testing data. To find the best-performing model, we used tenfold cross-validation on the training data and then measured the performance (measured by \(R^{2}\), the coefficient of determination) in the testing data by applying the best-performing models generated from the training phase. We compared two models (linear regression and random forest) because of the simple structure of the data. The performance is summarized in Table 10.
The random forest model achieved nearly perfect performance in predicting 2014 weather attributes. However, in this dataset, the weather attributes in 2008 were significantly different from the remaining years. The average (temperature, precipitation, radiation) values across all sites in 2008 were (188.7, 3417.5, 516.1) whereas the averages from 2009 to 2014 were (3475.6, 631.7, 1082051). The year 2008 was particularly cold and wet with less sun radiation; all other years (2009–2014) had more or less similar measurements between each other. If we chose to use such model, we would need to use six measurements from 2009 to 2014 to predict values for 2015, removing the 2008 data and adding the 2014 data, which would likely predict inaccurate values for 2015. We chose instead to use our non-parametric sampling methods to generate weather predictions as such a method is likely to be more robust to extreme weather than the random forest model. Additionally, we wanted to have a range of possible values for weather, reflecting natural uncertainty, rather than a single predicted value.
Rights and permissions
About this article
Cite this article
Zhong, H., Li, X., Lobell, D. et al. Hierarchical modeling of seed variety yields and decision making for future planting plans. Environ Syst Decis 38, 458–470 (2018). https://doi.org/10.1007/s10669-018-9695-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10669-018-9695-4