Sugar Tech

pp 1–10 | Cite as

Sugarcane Yield Prediction Through Data Mining and Crop Simulation Models

  • Ralph G. Hammer
  • Paulo C. SentelhasEmail author
  • Jean C. Q. Mariano
Research Article


The understanding of the hierarchical importance of the factors which influence sugarcane yield can subsidize its modeling, thus contributing to the optimization of agricultural planning and crop yield estimates. The objectives of this study were to identify and ordinate the main variables that condition sugarcane yield, according to their relative importance, as well as to develop mathematical models for predicting sugarcane yield by using data mining (DM) techniques. For this, three DM techniques were applied in the analyses of databases of several sugar mills in the state of São Paulo, Brazil. Meteorological and crop management variables were analyzed through the following DM techniques: random forest; boosting; and support vector machine, and the resulting models were tested through the comparison with an independent data set. Finally, the predictive performances of these models were compared with the performance of a simple agrometeorological model, applied in the same data set. The results allowed to conclude that, within all the variables assessed, the number of cuts was the most important factor considered by all DM techniques. The comparison between the observed yields and those estimated by the DM models resulted in a root mean square error (RMSE) ranging between 19.70 and 20.03 t ha−1, which was much better than the performance of the Agroecological Zone Model, which presented RMSE ≈ 34 t ha−1.


Yield estimation Random forest Boosting Support vector machines Crop model 


Author Contributions

RGH and PCS were both responsible for designing the study, analyzing and discussing the results and writing the manuscript. JCQM was responsible for writing the scripts for data organization, consistency and analysis.

Compliance with Ethical Standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. Allen, R.G., L.S. Pereira, D. Raes, and M. Smith. 1998. Crop evapotranspiration: guidelines for computing crop water requirements, 56. Rome: FAO - Irrigation and Drainage Paper.Google Scholar
  2. Baquero, J.E., R. Ralisch, C.C. Medina, J. Tavares Filho, and M.F. Guimarães. 2012. Soil physical properties and sugarcane root growth in a red Oxisol. Revista Brasileira de Ciência do Solo 36: 63–70.CrossRefGoogle Scholar
  3. Camargo, A.P., and P.C. Sentelhas. 1997. Avaliação do desempenho de diferentes métodos de estimativa de evapotranspiração potencial no Estado de são Paulo, Brasil. Revista Brasileira de Meteorologia 5: 89–97.Google Scholar
  4. Camilo, C.O., and J.C. Silva. 2009. Mineração de dados: conceitos, tarefas, métodos e ferramentas: relatório técnico. Goiânia: Universidade Federal de Goiás, Instituto de Informática.Google Scholar
  5. Dias, H.B., and P.C. Sentelhas. 2017. Evaluation of three sugarcane simulation models and their ensemble for yield estimation in commercially managed fields. Field Crops Research 213: 174–185.CrossRefGoogle Scholar
  6. Doorenbos, J., and A.H. Kassan. 1979. Yield response to water, 33. Rome: FAO - Irrigation and Drainage Paper.Google Scholar
  7. Everingham, Y.L., J. Sexton, and A. Robson. 2015. A statistical approach for identifying important climatic influences on sugarcane yields. In Proceedings of Australian Society of Sugar Cane Technologists. Bundaberg, Australia, 8–15.Google Scholar
  8. Everingham, Y.L., J. Sexton, D. Skocaj, and G.I. Bamber. 2016. Accurate prediction of sugarcane yield using a random forest algorithm. Agronomy for Sustainable Development 27: 1–9.Google Scholar
  9. Ferraro, D.O., D.E. Rivero, and C.M. Ghersa. 2009. An analysis of the factors that influence sugarcane yield in Northern Argentina using classification and regression trees. Field Crops Research 112: 149–157.CrossRefGoogle Scholar
  10. Franchini, J.C., E. Torres, S.L. Gonçalves, and O.F. Saraiva. 2007. Contribuição de sistema de manejo para a produção sustentável da soja. Londrina: Embrapa-Soja. Circular Técnica 46.Google Scholar
  11. Keating, B.A., M.J. Robertson, R.C. Muchow, and N.I. Huth. 1999. Modeling sugarcane production system I. Development and performance of the sugarcane module. Field Crops Research 61: 253–271.CrossRefGoogle Scholar
  12. Kodeeshwari, R.S., and K.T. Ilakkiya. 2017. Different types of data mining techniques used in agriculture—a survey. International Journal of Advanced Engineering Research and Science 4: 17–23.CrossRefGoogle Scholar
  13. Kuhn, M. 2008. Building predictive models in R using the caret package. Journal of Statistical Software 28: 1–26.CrossRefGoogle Scholar
  14. Lawes, R.A., L.M. McDonald, M.K. Wegener, K.E. Basford, and R.J. Lawn. 2002. Factors affecting cane yield and commercial cane sugar in the Tully district. Australian Journal of Experimental Agriculture 42: 473–480.CrossRefGoogle Scholar
  15. Marin, F.R., J.W. Jones, F. Royce, C. Suguitani, J.L. Donzelli, W.J. Pallone Filho, and D.S.P. Nassif. 2011. Parametrization and evaluation of predictions of CSM-CANEGRO for Brazilian sugarcane. Agrononmy Journal 103: 100–110.CrossRefGoogle Scholar
  16. Monteiro, L.A., and P.C. Sentelhas. 2014. Potential and actual sugarcane yields in Southern Brazil as a function of climate conditions and crop management. Sugar Tech 16(264–276): 2014.Google Scholar
  17. Monteiro, L.A., and P.C. Sentelhas. 2017. Sugarcane yield gap: can it be determined at national level with a simple agrometeorological model? Crop and Pasture Science 68: 272–284.CrossRefGoogle Scholar
  18. Moura, A.M. 2003. Geoprocessamento na gestão e planejamento urbano. Belo Horizonte: Ed. da Autora.Google Scholar
  19. Nassif, D.S.P., F.R. Marin, W.J. Pallone Filho, R.S. Resende, and G.Q. Pellegrino. 2012. Parametrização e avaliação do modelo DSSAT/CANEGRO para variedades brasileiras de cana-de-açúcar. Pesquisa Agropecuária Brasileira 47: 311–318.CrossRefGoogle Scholar
  20. O’Leary, G.J. 2000. A review of three sugarcane simulation models with respect to their prediction of sucrose yield. Field Crops Research 68: 97–111.CrossRefGoogle Scholar
  21. Peloia, P.R., F.F. Bocca, and L.H.A. Rodrigues. 2019. Identification of patterns for increasing production with decision trees in sugarcane mill data. Scientia Agricola 76: 281–289.CrossRefGoogle Scholar
  22. Pereira, A.R., L.R. Angelocci, and P.C. Sentelhas. 2002. Agrometeorologia: fundamentos e aplicações práticas. Guaíba: Ed. Agropecuária.Google Scholar
  23. Prado, H. 2008. Pedologia Fácil—Aplicações na agricultura. Piracicaba: ESALQ.Google Scholar
  24. Ramburan, S., M. Zhou, and M. Labuschagne. 2011. Interpretation of genotype x environment interactions of sugarcane: identifying significant environmental factors. Field Crops Research 124: 392–399.CrossRefGoogle Scholar
  25. Refaeilzadeh, P., L. Tang, and H. Liu. 2009. Cross-validation. In Encyclopedia of database systems, ed. L. Liu and M.T. Özsu. New York: Springer.Google Scholar
  26. Sousa, A.C.M., Z.M. Souza, R.M.P. Claret, and J.L.R. Torres. 2017. Traffic control with auto-pilot as an alternative to decrease soil compaction in sugarcane areas. Tropical and Subtropical Agroecosystems 20: 173–182.Google Scholar
  27. Souza, G.S., Z.M. Souza, R.B. Silva, R.S. Barbosa, and F.S. Araújo. 2014. Effects of traffic control on the soil physical quality and the cultivation of Sugarcane. Revista Brasileira de Ciência do Solo 38: 135–146.CrossRefGoogle Scholar
  28. Thornthwaite, C.W., and J.R. Mather. 1955. The water balance. New Jersey: Drexel Institute of Technology (Publications in Climatology 8).Google Scholar
  29. Vianna, M.S., and P.C. Sentelhas. 2015. Performance of DSSAT CSM-CANEGRO under operational conditions and its use in determining the ‘saving irrigation’ impact on sugarcane crop. Sugar Tech 18: 75–86.CrossRefGoogle Scholar
  30. Willmott, C.J. 1981. On the validation of models. Physical Geography 2: 184–194.CrossRefGoogle Scholar

Copyright information

© Society for Sugar Research & Promotion 2019

Authors and Affiliations

  1. 1.Department of Biosystems EngineeringESALQ, University of São PauloPiracicabaBrazil
  2. 2.Independent Data Science SpecialistPiracicabaBrazil

Personalised recommendations