Skip to main content

Advertisement

Log in

Hierarchical modeling of seed variety yields and decision making for future planting plans

  • Published:
Environment Systems and Decisions Aims and scope Submit manuscript

Abstract

Eradicating hunger and malnutrition is a key development goal of the twenty first century. This paper addresses the problem of optimally identifying seed varieties to reliably increase crop yield within a risk-sensitive decision making framework. Specifically, a novel hierarchical machine learning mechanism for predicting crop yield (the yield of different seed varieties of the same crop) is introduced. This prediction mechanism is then integrated with a weather forecasting model and three different approaches for decision making under uncertainty to select seed varieties for planting so as to balance yield maximization and risk. The model was applied to the problem of soybean variety selection given in the 2016 Syngenta Crop Challenge. The prediction model achieved a median absolute error of 235 kg/ha and thus provides good estimates for input into the decision models. The decision models identified the selection of soybean varieties that appropriately balance yield and risk as a function of the farmer’s risk aversion level. More generally, the models can support farmers in decision making about which seed varieties to plant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaiyang Zhong.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Appendices

Appendix 1: Algorithm for solving robust optimization model

We use the following heuristic algorithm to efficiently solve the robust optimization model:

figure a

Appendix 2: Feature code key

Table 8 provides a detailed description of the 30 features in the Syngenta dataset.

Table 8 Description of features

Appendix 3: Cluster results from hierarchical clustering method

We implemented hierarchical clustering with Ward’s minimum variance method (which minimizes total within-cluster variance). To compare with the k-means clustering results, we set the number of clusters to 12 in the hierarchical clustering. Results are shown in Table 9. We found that most of the groups (groups 1, 2, 4, 6, 7, 8, 9, 10, 11, 12) retained nearly the same varieties. The original group 5 in the k-means had only one variety (V54) which became a member of group 4 in the new cluster. The members in the original group 3 were split into the new groups 3 and 5, with the new group 3 retaining most members.

Table 9 Clustering results from hierarchical clustering method

Appendix 4: Machine learning methods for weather prediction

We performed experiments using different machine learning methods to predict temperature, precipitation, and radiation in the next year. We used six values from 2008 to 2013 to predict the value for 2014. We randomly selected 75% of the data as the training data and the remaining 25% as the testing data. To find the best-performing model, we used tenfold cross-validation on the training data and then measured the performance (measured by \(R^{2}\), the coefficient of determination) in the testing data by applying the best-performing models generated from the training phase. We compared two models (linear regression and random forest) because of the simple structure of the data. The performance is summarized in Table 10.

The random forest model achieved nearly perfect performance in predicting 2014 weather attributes. However, in this dataset, the weather attributes in 2008 were significantly different from the remaining years. The average (temperature, precipitation, radiation) values across all sites in 2008 were (188.7, 3417.5, 516.1) whereas the averages from 2009 to 2014 were (3475.6, 631.7, 1082051). The year 2008 was particularly cold and wet with less sun radiation; all other years (2009–2014) had more or less similar measurements between each other. If we chose to use such model, we would need to use six measurements from 2009 to 2014 to predict values for 2015, removing the 2008 data and adding the 2014 data, which would likely predict inaccurate values for 2015. We chose instead to use our non-parametric sampling methods to generate weather predictions as such a method is likely to be more robust to extreme weather than the random forest model. Additionally, we wanted to have a range of possible values for weather, reflecting natural uncertainty, rather than a single predicted value.

Table 10 Performance summary of weather attributes prediction

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, H., Li, X., Lobell, D. et al. Hierarchical modeling of seed variety yields and decision making for future planting plans. Environ Syst Decis 38, 458–470 (2018). https://doi.org/10.1007/s10669-018-9695-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10669-018-9695-4

Keywords

Navigation