Operational Research

, Volume 9, Issue 2, pp 121–140 | Cite as

A survey of data mining techniques applied to agriculture

  • A. Mucherino
  • Petraq PapajorgjiEmail author
  • P. M. Pardalos


In this survey we present some of the most used data mining techniques in the field of agriculture. Some of these techniques, such as the k-means, the k nearest neighbor, artificial neural networks and support vector machines, are discussed and an application in agriculture for each of these techniques is presented. Data mining in agriculture is a relatively novel research field. It is our opinion that efficient techniques can be developed and tailored for solving complex agricultural problems using data mining. At the end of this survey we provide recommendations for future research directions in agriculture-related fields.


Data mining Optimization k nearest neighbor k-means Support vector machines Artificial neural networks Agriculture 



This research has been partially supported by NSF grants.


  1. Abello J, Pardalos PM, Resende M (2002) Handbook of massive data sets. Kluwer, New YorkGoogle Scholar
  2. Aerts J-M, Jans P, Halloy D, Gustin P, Berckmans D (2004) Labeling of cough data from pigs for on-line disease monitoring by sound analysis. Am Soc Agric Biol Eng 48(1):351–354Google Scholar
  3. Angiulli F, Folino G (2007) Efficient distributed data condensation for nearest neighbor classification. In: Kermarrec A-M, Bouge L, Priol T (eds) Lecture notes on computer science vol 4641, pp 338–347Google Scholar
  4. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517CrossRefGoogle Scholar
  5. Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkGoogle Scholar
  6. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):955–974CrossRefGoogle Scholar
  7. Busygin S, Prokopyev OA, Pardalos PM (2005) Feature selection for consistent biclustering via fractional 0–1 programming. J Comb Optim 10:7–21CrossRefGoogle Scholar
  8. Brown RL (1995) Accelerated template matching using template trees grown by condensation. IEEE Trans Syst Man Cybernet 25(3):523–528CrossRefGoogle Scholar
  9. Brudzewski K, Osowski S, Markiewicz T (2004) Classification of milk by means of an electronic nose and SVM neural network. Sens Actuators B98:291–298Google Scholar
  10. Camps-Valls G, Gomez-Chova L, Calpe-Maravilla J, Soria-Olivas E, Martin-Guerrero JD, Moreno J (2003) Support vector machines for crop classification using hyperspectral data. Lect Notes Comp Sci 2652:134–141Google Scholar
  11. Castellano G, Fanelli AM, Pelillo M (1997) An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8(3):519–531CrossRefGoogle Scholar
  12. Chedad A, Moshou D, Aerts JM, Van Hirtum A, Ramon H, Berckmans D (2001) Recognition system for pig cough based on probabilistic neural networks. J Agricult Eng Res 79(4):449–457CrossRefGoogle Scholar
  13. Cortes C, Vapnik V (1995) Support vector networks. Mach Learning 20:273–297Google Scholar
  14. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27CrossRefGoogle Scholar
  15. Das KC, Evans MD (1992) Detecting fertility of hatching eggs using machine vision II: neural network classifiers. Trans ASAE 35(6):2035–2041Google Scholar
  16. Dempster AP, Laird NM, Rubin RD (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1):1–38Google Scholar
  17. Devi VS, Murty MN (2002) An incremental prototype set building technique. Pattern Recognit 35:505–513CrossRefGoogle Scholar
  18. Du C-J, Sun D-W (2005) Pizza sauce spread classification using colour vision and support vector machines. J Food Eng 66:137–145CrossRefGoogle Scholar
  19. Fagerlund S (2007) Bird species recognition using support vector machines. EURASIP J Adv Signal Processing, Article ID 38637, p 8Google Scholar
  20. Fnaiech N, Abid S, Fnaiech F, Cheriet M (2004) A modified version of a formal pruning algorithm based on local relative variance analysis. First international symposium on control, communications and signal processing, MarchGoogle Scholar
  21. Gates GW (1972), The reduced nearest neighbor rule. IEEE Trans Inf Theory 18:431–433CrossRefGoogle Scholar
  22. Gil-Garcia R, Badia-Contelles JM, Pons-Porrata A (2007) Parallel nearest neighbour algorithms for text categorization. Lect Notes Comp Sci 4641:328–337CrossRefGoogle Scholar
  23. Guan Y, Ghorbani AA, Belacel N (2003) Y-means: a clustering method for intrusion Detection. In: IEEE Canadian conference on electrical and computer engineering, proceedings, 1083–1086Google Scholar
  24. Hansen P, Mladenovic N (2002) J-means: a new local search heuristic for minimum sum-of-squares clustering. Pattern Recognit 34(2):405–413CrossRefGoogle Scholar
  25. Hammerstrom D (1993) Neural networks at work. IEEE Spectr:26–32 (June)Google Scholar
  26. Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516CrossRefGoogle Scholar
  27. Hartigan J (1975) Clustering algorithms. John Wiles & Sons, New YorkGoogle Scholar
  28. Holmgren P, Thuresson T (1998) Satellite remote sensing for forestry planning: a review. Scand J For Res 13(1):90–110CrossRefGoogle Scholar
  29. Graf HP, Cosatto E, Bottou L, Dourdanovic I, Vapnik V (2005) Parallel support vector machines: the cascade SVM. In: Bernhard S, Leon B (eds) Advances in neural information processing systems. Lawrence Saul, vol 17, MIT PressGoogle Scholar
  30. Jagtap SS, Lall U, Jones JW, Gijsman AJ, Ritchie JT (2004) Dynamic nearest-neighbor method for estimating soil water parameters. Trans ASAE 47(5):1437–1444Google Scholar
  31. Jinlan T, Lin Z, Suqin Z, Lu L (2005) Improvement and parallelism of k-means clustering algorithm. Tsinghua Sci Technol 10(3):277–281CrossRefGoogle Scholar
  32. Jolliffe IT (1972) Discarding variables in a principal component analysis. I: artificial data. Appl Stat 21(2):160–173CrossRefGoogle Scholar
  33. Jones JW, Tsuji GY, Hoogenboom G, Hunt LA, Thornton PK, Wilkens PW, Imamura DT, Bowen WT, Singh U (1998) Decision support system for agrotechnology transfer: DSSAT v3. In: Tsuji GY, Hoogenboom G, Thornton PK (eds) Understanding options for agricultural production. Kluwer Academic Publishers, Dordrecht, pp 157–177Google Scholar
  34. Jorquera H, Perez R, Cipriano A, Acuna G (2001) Short term forecasting of air pollution episodes. In: Zannetti P (eds) Environmental modeling 4. WIT Press, UKGoogle Scholar
  35. Karimi Y, Prasher SO, Patel RM, Kim SH (2006) Application of support vector machine technology for Weed and nitrogen stress detection in corn. Comput Electronics Agricult 51:99–109CrossRefGoogle Scholar
  36. Kernel-Machines web site:
  37. Klosgen W, Zytkow JM (2002) Handbook of data mining and knowledge discovery. Oxford University PressGoogle Scholar
  38. Krishna K, Murty M (1999) Genetic k-means algorithm. IEEE Trans Syst Man Cybern Part B Cybern 29(3):433–439CrossRefGoogle Scholar
  39. Leemans V, Destain MF (2004) A real time grading method of apples based on features extracted from defects. J Food Eng 61:83–89CrossRefGoogle Scholar
  40. Leonard RA, Knisel WG, Still DA (1987) GLEAMS: groundwater-loading effects of agricultural management systems. Trans Am Soc Agric Eng 30(5):1403–1418Google Scholar
  41. Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137CrossRefGoogle Scholar
  42. Nahapatyan A, Busygin S, Pardalos P (2008) An improved heuristic for consistent biclustering problems. In: Mondaini RP, Pardalos PM (eds) Mathematical modelling of biosystems. Appl Optim 102. Springer, pp 185–198Google Scholar
  43. Meyer GE, Neto JC, Jones DD, Hindman TW (2004) Intensified fuzzy clusters for classifying plant, soil, and residue regions of interest from color images. Comput Electronics Agric 42:161–180CrossRefGoogle Scholar
  44. Moreaux B, Beerens D, Gustin P (1999) Development of a cough induction test in pigs: effects of SR 48968 and enalapril. J Veterinary Pharmacol Ther 22:387–389CrossRefGoogle Scholar
  45. Moshou D, Chedad A, Van Hirtum A, De Baerdemaeker J, Berckmans D, Ramon H (2001) An intellingent alarm for early detection of swine epidemics based on neural networks. Am Soc Agric Eng 44(1):167–174Google Scholar
  46. Moshou D, Chedad A, Van Hirtum A, De Baerdemaeker J, Berckmans D, Ramon H (2001) Neural recognition system for swine cough. Math Comput Simul 56:475–487CrossRefGoogle Scholar
  47. Mucherino A, Papajorgji P, Pardalos PM (2009) Data mining in agriculture. Springer, New York (in press)Google Scholar
  48. Nurnberger A, Pedrycz W, Kruse R (2002) Neural network approaches. In: Klosgen W, Zytkow JM (eds) Handbook of data mining and knowledge discovery. Oxford University PressGoogle Scholar
  49. Papajorgji P, Pardalos PM (2006) Software engineering techniques applied to agricultural systems an object-oriented and UML Approach. Springer, New YorkGoogle Scholar
  50. Pardalos PM, Boginski LV, Vazacopoulos A (2007) Data mining in biomedicine. Springer, New YorkGoogle Scholar
  51. Pardalos PM, Hansen P (2008) Data mining and mathematical programming. American Mathematical Society, USAGoogle Scholar
  52. Patel VC, McClendon RW, Goodrum JW (1994) Crack detection in eggs using computer vision and neural networks. Artif Intell Appl 8(2):21–31Google Scholar
  53. Fernandez Pierna JA, Baeten V, Michotte Renier A, Cogdill RP, Dardenne P (2004) Combination of support vector machines (SVM) and near-infrared (NIR) imaging spectroscopy for the detection of meat and bone meal (MBM) in compound feeds. J Chemom 18:341–349CrossRefGoogle Scholar
  54. Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schilkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, MA, USA, pp 185–208Google Scholar
  55. Rajagopalan B, Lall U (1999) A k-nearest-neighbor simulator for daily precipitation and other weather variables. Wat Res Res 35(10):3089–3101CrossRefGoogle Scholar
  56. Reed R (1993) Pruning algorithms—A survey. IEEE Trans Neural Netw 4(5):740–747CrossRefGoogle Scholar
  57. Riul A Jr, de Sousa HC, Malmegrim RR, dos Santos DS Jr, Carvalho ACPLF, Fonseca FJ, Oliveira Jr ON, Mattoso LHC (2004) Wine classification by taste sensors made from ultra-thin films and using neural networks. Sens Actuators B98:77–82Google Scholar
  58. Schwenker F (2000) Hierarchical support vector machines for multi-class pattern recognition. In: Proceedings of the 4th international conference on knowledge-based intellingent engineering systems and allied technologies (KES 2000), vol 2, pp 561–565, Brighton, UKGoogle Scholar
  59. Shahin MA, Tollner EW, McClendon RW (2001) Artificial intelligence classifiers for sorting apples based on watercore. J Agric Eng Res 79(3):265–274CrossRefGoogle Scholar
  60. Seiffert U (2002) Artificial neural networks on massively parallel computer hardware. European symposium on artificial networks proceedings, Bruges (Belgium), 319–330Google Scholar
  61. Spath H (1980) Cluster analysis algorithms for data reduction and classification of objects. Ellis Horwood, ChichesterGoogle Scholar
  62. Stockle CO, Martin SA, Campbell GS (1994) CropSyst, a cropping systems model: water/nitrogen budgets and crop yield. Agric Syst 46(3):335–359CrossRefGoogle Scholar
  63. Sung KK, Poggio T (2009) Example-based learning for view-based human face detection. A.I. Memo 1521, MITGoogle Scholar
  64. Tripathi S, Srinivas VV, Nanjundiah RS (2006) Downscaling of precipitation for climate change scenarios: a support vector machine approach. J Hydrol 330:621–640CrossRefGoogle Scholar
  65. Urtubia A, Perez-Correa JR, Meurens M, Agosin E (2004) Monitoring large scale wine fermentations with infrared spectroscopy. Talanta 64:778–784CrossRefGoogle Scholar
  66. Urtubia A, Perez-Correa JR, Soto A, Pszczolkowski P (2007) Using data mining techniques to predict industrial wine problem fermentations. Food Control 18:1512–1517CrossRefGoogle Scholar
  67. Vapnik VN (1998) Statistical learning theory. Wiley, New YorkGoogle Scholar
  68. Verheyen K, Adriaens D, Hermy M, Deckers S (2001) High-resolution continuous soil classification using morphological soil profile descriptions. Geoderma 101:31–48CrossRefGoogle Scholar
  69. Wu Y, Ianakiev K, Govindaraju V (2002) Inproved k-nearest neighbor classification. Pattern Recognit 35:2311–2318CrossRefGoogle Scholar
  70. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37CrossRefGoogle Scholar
  71. Zhang Y, Xiong Z, Mao J, Ou L (2006) The study of parallel k-means Algorithm. In: Proceedings of the 6th world congress on intelligent control and automation 2:5868–5871Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • A. Mucherino
    • 1
  • Petraq Papajorgji
    • 2
    Email author
  • P. M. Pardalos
    • 2
  1. 1.LIXÉcole PolytechniquePalaiseauFrance
  2. 2.Center for Applied OptimizationUniversity of FloridaGainesvilleUSA

Personalised recommendations