Data Mining in Precision Agriculture: Management of Spatial Information

  • Georg Ruß
  • Alexander Brenning
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6178)


Precision Agriculture is the application of state-of-the-art GPS technology in connection with site-specific, sensor-based treatment of the crop. It can also be described as a data-driven approach to agriculture, which is strongly connected with a number of data mining problems. One of those is also an inherently important task in agriculture: yield prediction. The question is: can a field’s yield be predicted in-season using available geo-coded data sets?

In the past, a number of approaches have been proposed towards this problem. Often, a broad variety of regression models for non-spatial data have been used, like regression trees, neural networks and support vector machines. But in a cross-validation learning approach, issues with the assumption of the data records’ statistical independence keep emerging. Hence, the geographical location of data records should clearly be considered while establishing a regression model and assessing its predictive performance. This paper gives a short overview of the available data, points out in detail the main issue with the classical learning approaches and presents a novel spatial cross-validation technique to overcome the problems with the classical approach towards the aforementioned yield prediction task.


Precision Agriculture Spatial Data Mining Regression Spatial Cross-Validation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152. ACM Press, New York (1992)CrossRefGoogle Scholar
  2. 2.
    Breiman, L.: Bagging predictors. Technical report, Department of Statistics, Univ. of California, Berkeley (1994)Google Scholar
  3. 3.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  4. 4.
    Brenning, A.: Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Science 5(6), 853–862 (2005)CrossRefGoogle Scholar
  5. 5.
    Brenning, A., Itzerott, S.: Comparing classifiers for crop identification based on multitemporal landsat tm/etm data. In: Proceedings of the 2nd workshop of the EARSeL Special Interest Group Remote Sensing of Land Use and Land Cover, pp. 64–71 (September 2006)Google Scholar
  6. 6.
    Brenning, A., Lausen, B.: Estimating error rates in the classification of paired organs. Statistics in Medicine 27(22), 4515–4531 (2008)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Bühlmann, P.: Bootstraps for time series. Statistical Science 17, 52–72 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Cressie, N.A.C.: Statistics for Spatial Data. Wiley, New York (1993)Google Scholar
  9. 9.
    Crone, S.F., Lessmann, S., Pietsch, S.: Forecasting with computational intelligence - an evaluation of support vector regression and artificial neural networks for time series prediction. In: International Joint Conference on Neural Networks, 2006. IJCNN ’06, pp. 3159–3166 (2006)Google Scholar
  10. 10.
    Griffith, D.A.: Spatial Autocorrelation and Spatial Filtering. In: Advances in Spatial Science, Springer, New York (2003)Google Scholar
  11. 11.
    Heege, H., Reusch, S., Thiessen, E.: Prospects and results for optical systems for site-specific on-the-go control of nitrogen-top-dressing in germany. Precision Agriculture 9(3), 115–131 (2008)CrossRefGoogle Scholar
  12. 12.
    Huang, C., Yang, L., Wylie, B., Homer, C.: A strategy for estimating tree canopy density using landsat 7 etm+ and high resolution images over large areas. In: Proceedings of the Third International Conference on Geospatial Information in Agriculture and Forestry (2001)Google Scholar
  13. 13.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of International Joint Conference on Artificial Intelligence (1995)Google Scholar
  14. 14.
    Lobell, D.B., Ortiz-Monasterio, J.I., Asner, G.P., Naylor, R.L., Falcon, W.P.: Combining field surveys, remote sensing, and regression trees to understand yield variations in an irrigated wheat landscape. Agronomy Journal 97, 241–249 (2005)Google Scholar
  15. 15.
    Meier, U.: Entwicklungsstadien mono- und dikotyler Pflanzen. In: Biologische Bundesanstalt fünd- und Forstwirtschaft, Braunschweig, Germany (2001)Google Scholar
  16. 16.
    Moran, P.A.P.: Notes on continuous stochastic phenomena. Biometrika 37, 17–33 (1950)zbMATHMathSciNetGoogle Scholar
  17. 17.
    Pozdnoukhov, A., Foresti, L., Kanevski, M.: Data-driven topo-climatic mapping with machine learning methods. Natural Hazards 50(3), 497–518 (2009)Google Scholar
  18. 18.
    R Development Core Team: R: A Language and Environment for Statistical Computing. In: R Foundation for Statistical Computing, Vienna, Austria (2009) ISBN 3-900051-07-0Google Scholar
  19. 19.
    Ruß, G.: Data mining of agricultural yield data: A comparison of regression models. In: Perner, P. (ed.) Advances in Data Mining. Applications and Theoretical Aspects. LNCS, vol. 5633, pp. 24–37. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  20. 20.
    Ruß, G., Kruse, R., Schneider, M., Wagner, P.: Estimation of neural network parameters for wheat yield prediction. In: Bramer, M. (ed.) AI in Theory and Practice II, July 2008. Proceedings of IFIP 2008, vol. 276, pp. 109–118. Springer, Heidelberg (July 2008)CrossRefGoogle Scholar
  21. 21.
    Ruß, G., Kruse, R., Schneider, M., Wagner, P.: Optimizing wheat yield prediction using different topologies of neural networks. In: Verdegay, J., Ojeda-Aciego, M., Magdalena, L. (eds.) Proceedings of IPMU ’08, pp. 576–582. University of Málaga (June 2008)Google Scholar
  22. 22.
    Ruß, G., Kruse, R., Wagner, P., Schneider, M.: Data mining with neural networks for wheat yield prediction. In: Perner, P. (ed.) ICDM 2008. LNCS (LNAI), vol. 5077, pp. 47–56. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  23. 23.
    Stafford, J.V., Ambler, B., Lark, R.M., Catt, J.: Mapping and interpreting the yield variation in cereal crops. Computers and Electronics in Agriculture 14(2-3), 101–119 (1996), Spatially Variable Field OperationsGoogle Scholar
  24. 24.
    Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer Series in Statistics. Springer, Heidelberg (June 1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Georg Ruß
    • 1
  • Alexander Brenning
    • 2
  1. 1.Otto-von-Guericke-Universität MagdeburgGermany
  2. 2.University of WaterlooCanada

Personalised recommendations