A survey of data mining techniques applied to agriculture

Abstract

In this survey we present some of the most used data mining techniques in the field of agriculture. Some of these techniques, such as the k-means, the k nearest neighbor, artificial neural networks and support vector machines, are discussed and an application in agriculture for each of these techniques is presented. Data mining in agriculture is a relatively novel research field. It is our opinion that efficient techniques can be developed and tailored for solving complex agricultural problems using data mining. At the end of this survey we provide recommendations for future research directions in agriculture-related fields.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Abello J, Pardalos PM, Resende M (2002) Handbook of massive data sets. Kluwer, New York

  2. Aerts J-M, Jans P, Halloy D, Gustin P, Berckmans D (2004) Labeling of cough data from pigs for on-line disease monitoring by sound analysis. Am Soc Agric Biol Eng 48(1):351–354

    Google Scholar 

  3. Angiulli F, Folino G (2007) Efficient distributed data condensation for nearest neighbor classification. In: Kermarrec A-M, Bouge L, Priol T (eds) Lecture notes on computer science vol 4641, pp 338–347

  4. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    Article  Google Scholar 

  5. Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

  6. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):955–974

    Article  Google Scholar 

  7. Busygin S, Prokopyev OA, Pardalos PM (2005) Feature selection for consistent biclustering via fractional 0–1 programming. J Comb Optim 10:7–21

    Article  Google Scholar 

  8. Brown RL (1995) Accelerated template matching using template trees grown by condensation. IEEE Trans Syst Man Cybernet 25(3):523–528

    Article  Google Scholar 

  9. Brudzewski K, Osowski S, Markiewicz T (2004) Classification of milk by means of an electronic nose and SVM neural network. Sens Actuators B98:291–298

    Google Scholar 

  10. Camps-Valls G, Gomez-Chova L, Calpe-Maravilla J, Soria-Olivas E, Martin-Guerrero JD, Moreno J (2003) Support vector machines for crop classification using hyperspectral data. Lect Notes Comp Sci 2652:134–141

    Google Scholar 

  11. Castellano G, Fanelli AM, Pelillo M (1997) An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8(3):519–531

    Article  Google Scholar 

  12. Chedad A, Moshou D, Aerts JM, Van Hirtum A, Ramon H, Berckmans D (2001) Recognition system for pig cough based on probabilistic neural networks. J Agricult Eng Res 79(4):449–457

    Article  Google Scholar 

  13. Cortes C, Vapnik V (1995) Support vector networks. Mach Learning 20:273–297

    Google Scholar 

  14. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    Article  Google Scholar 

  15. Das KC, Evans MD (1992) Detecting fertility of hatching eggs using machine vision II: neural network classifiers. Trans ASAE 35(6):2035–2041

    Google Scholar 

  16. Dempster AP, Laird NM, Rubin RD (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1):1–38

    Google Scholar 

  17. Devi VS, Murty MN (2002) An incremental prototype set building technique. Pattern Recognit 35:505–513

    Article  Google Scholar 

  18. Du C-J, Sun D-W (2005) Pizza sauce spread classification using colour vision and support vector machines. J Food Eng 66:137–145

    Article  Google Scholar 

  19. Fagerlund S (2007) Bird species recognition using support vector machines. EURASIP J Adv Signal Processing, Article ID 38637, p 8

  20. Fnaiech N, Abid S, Fnaiech F, Cheriet M (2004) A modified version of a formal pruning algorithm based on local relative variance analysis. First international symposium on control, communications and signal processing, March

  21. Gates GW (1972), The reduced nearest neighbor rule. IEEE Trans Inf Theory 18:431–433

    Article  Google Scholar 

  22. Gil-Garcia R, Badia-Contelles JM, Pons-Porrata A (2007) Parallel nearest neighbour algorithms for text categorization. Lect Notes Comp Sci 4641:328–337

    Article  Google Scholar 

  23. Guan Y, Ghorbani AA, Belacel N (2003) Y-means: a clustering method for intrusion Detection. In: IEEE Canadian conference on electrical and computer engineering, proceedings, 1083–1086

  24. Hansen P, Mladenovic N (2002) J-means: a new local search heuristic for minimum sum-of-squares clustering. Pattern Recognit 34(2):405–413

    Article  Google Scholar 

  25. Hammerstrom D (1993) Neural networks at work. IEEE Spectr:26–32 (June)

  26. Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516

    Article  Google Scholar 

  27. Hartigan J (1975) Clustering algorithms. John Wiles & Sons, New York

    Google Scholar 

  28. Holmgren P, Thuresson T (1998) Satellite remote sensing for forestry planning: a review. Scand J For Res 13(1):90–110

    Article  Google Scholar 

  29. Graf HP, Cosatto E, Bottou L, Dourdanovic I, Vapnik V (2005) Parallel support vector machines: the cascade SVM. In: Bernhard S, Leon B (eds) Advances in neural information processing systems. Lawrence Saul, vol 17, MIT Press

  30. Jagtap SS, Lall U, Jones JW, Gijsman AJ, Ritchie JT (2004) Dynamic nearest-neighbor method for estimating soil water parameters. Trans ASAE 47(5):1437–1444

    Google Scholar 

  31. Jinlan T, Lin Z, Suqin Z, Lu L (2005) Improvement and parallelism of k-means clustering algorithm. Tsinghua Sci Technol 10(3):277–281

    Article  Google Scholar 

  32. Jolliffe IT (1972) Discarding variables in a principal component analysis. I: artificial data. Appl Stat 21(2):160–173

    Article  Google Scholar 

  33. Jones JW, Tsuji GY, Hoogenboom G, Hunt LA, Thornton PK, Wilkens PW, Imamura DT, Bowen WT, Singh U (1998) Decision support system for agrotechnology transfer: DSSAT v3. In: Tsuji GY, Hoogenboom G, Thornton PK (eds) Understanding options for agricultural production. Kluwer Academic Publishers, Dordrecht, pp 157–177

    Google Scholar 

  34. Jorquera H, Perez R, Cipriano A, Acuna G (2001) Short term forecasting of air pollution episodes. In: Zannetti P (eds) Environmental modeling 4. WIT Press, UK

  35. Karimi Y, Prasher SO, Patel RM, Kim SH (2006) Application of support vector machine technology for Weed and nitrogen stress detection in corn. Comput Electronics Agricult 51:99–109

    Article  Google Scholar 

  36. Kernel-Machines web site: http://www.kernel-machines.org/

  37. Klosgen W, Zytkow JM (2002) Handbook of data mining and knowledge discovery. Oxford University Press

  38. Krishna K, Murty M (1999) Genetic k-means algorithm. IEEE Trans Syst Man Cybern Part B Cybern 29(3):433–439

    Article  Google Scholar 

  39. Leemans V, Destain MF (2004) A real time grading method of apples based on features extracted from defects. J Food Eng 61:83–89

    Article  Google Scholar 

  40. Leonard RA, Knisel WG, Still DA (1987) GLEAMS: groundwater-loading effects of agricultural management systems. Trans Am Soc Agric Eng 30(5):1403–1418

    Google Scholar 

  41. Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    Article  Google Scholar 

  42. Nahapatyan A, Busygin S, Pardalos P (2008) An improved heuristic for consistent biclustering problems. In: Mondaini RP, Pardalos PM (eds) Mathematical modelling of biosystems. Appl Optim 102. Springer, pp 185–198

  43. The MathWorks: http://www.mathworks.com/

  44. Meyer GE, Neto JC, Jones DD, Hindman TW (2004) Intensified fuzzy clusters for classifying plant, soil, and residue regions of interest from color images. Comput Electronics Agric 42:161–180

    Article  Google Scholar 

  45. Moreaux B, Beerens D, Gustin P (1999) Development of a cough induction test in pigs: effects of SR 48968 and enalapril. J Veterinary Pharmacol Ther 22:387–389

    Article  Google Scholar 

  46. Moshou D, Chedad A, Van Hirtum A, De Baerdemaeker J, Berckmans D, Ramon H (2001) An intellingent alarm for early detection of swine epidemics based on neural networks. Am Soc Agric Eng 44(1):167–174

    Google Scholar 

  47. Moshou D, Chedad A, Van Hirtum A, De Baerdemaeker J, Berckmans D, Ramon H (2001) Neural recognition system for swine cough. Math Comput Simul 56:475–487

    Article  Google Scholar 

  48. Mucherino A, Papajorgji P, Pardalos PM (2009) Data mining in agriculture. Springer, New York (in press)

  49. Nurnberger A, Pedrycz W, Kruse R (2002) Neural network approaches. In: Klosgen W, Zytkow JM (eds) Handbook of data mining and knowledge discovery. Oxford University Press

  50. Papajorgji P, Pardalos PM (2006) Software engineering techniques applied to agricultural systems an object-oriented and UML Approach. Springer, New York

  51. Pardalos PM, Boginski LV, Vazacopoulos A (2007) Data mining in biomedicine. Springer, New York

  52. Pardalos PM, Hansen P (2008) Data mining and mathematical programming. American Mathematical Society, USA

  53. Patel VC, McClendon RW, Goodrum JW (1994) Crack detection in eggs using computer vision and neural networks. Artif Intell Appl 8(2):21–31

    Google Scholar 

  54. Fernandez Pierna JA, Baeten V, Michotte Renier A, Cogdill RP, Dardenne P (2004) Combination of support vector machines (SVM) and near-infrared (NIR) imaging spectroscopy for the detection of meat and bone meal (MBM) in compound feeds. J Chemom 18:341–349

    Article  Google Scholar 

  55. Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schilkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, MA, USA, pp 185–208

  56. Rajagopalan B, Lall U (1999) A k-nearest-neighbor simulator for daily precipitation and other weather variables. Wat Res Res 35(10):3089–3101

    Article  Google Scholar 

  57. Reed R (1993) Pruning algorithms—A survey. IEEE Trans Neural Netw 4(5):740–747

    Article  Google Scholar 

  58. Riul A Jr, de Sousa HC, Malmegrim RR, dos Santos DS Jr, Carvalho ACPLF, Fonseca FJ, Oliveira Jr ON, Mattoso LHC (2004) Wine classification by taste sensors made from ultra-thin films and using neural networks. Sens Actuators B98:77–82

    Google Scholar 

  59. Schwenker F (2000) Hierarchical support vector machines for multi-class pattern recognition. In: Proceedings of the 4th international conference on knowledge-based intellingent engineering systems and allied technologies (KES 2000), vol 2, pp 561–565, Brighton, UK

  60. Shahin MA, Tollner EW, McClendon RW (2001) Artificial intelligence classifiers for sorting apples based on watercore. J Agric Eng Res 79(3):265–274

    Article  Google Scholar 

  61. Seiffert U (2002) Artificial neural networks on massively parallel computer hardware. European symposium on artificial networks proceedings, Bruges (Belgium), 319–330

  62. Spath H (1980) Cluster analysis algorithms for data reduction and classification of objects. Ellis Horwood, Chichester

    Google Scholar 

  63. Stockle CO, Martin SA, Campbell GS (1994) CropSyst, a cropping systems model: water/nitrogen budgets and crop yield. Agric Syst 46(3):335–359

    Article  Google Scholar 

  64. Sung KK, Poggio T (2009) Example-based learning for view-based human face detection. A.I. Memo 1521, MIT

  65. Tripathi S, Srinivas VV, Nanjundiah RS (2006) Downscaling of precipitation for climate change scenarios: a support vector machine approach. J Hydrol 330:621–640

    Article  Google Scholar 

  66. Urtubia A, Perez-Correa JR, Meurens M, Agosin E (2004) Monitoring large scale wine fermentations with infrared spectroscopy. Talanta 64:778–784

    Article  Google Scholar 

  67. Urtubia A, Perez-Correa JR, Soto A, Pszczolkowski P (2007) Using data mining techniques to predict industrial wine problem fermentations. Food Control 18:1512–1517

    Article  Google Scholar 

  68. Vapnik VN (1998) Statistical learning theory. Wiley, New York

  69. Verheyen K, Adriaens D, Hermy M, Deckers S (2001) High-resolution continuous soil classification using morphological soil profile descriptions. Geoderma 101:31–48

    Article  Google Scholar 

  70. Wu Y, Ianakiev K, Govindaraju V (2002) Inproved k-nearest neighbor classification. Pattern Recognit 35:2311–2318

    Article  Google Scholar 

  71. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37

    Article  Google Scholar 

  72. Zhang Y, Xiong Z, Mao J, Ou L (2006) The study of parallel k-means Algorithm. In: Proceedings of the 6th world congress on intelligent control and automation 2:5868–5871

Download references

Acknowledgment

This research has been partially supported by NSF grants.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Petraq Papajorgji.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Mucherino, A., Papajorgji, P. & Pardalos, P.M. A survey of data mining techniques applied to agriculture. Oper Res Int J 9, 121–140 (2009). https://doi.org/10.1007/s12351-009-0054-6

Download citation

Keywords

  • Data mining
  • Optimization
  • k nearest neighbor
  • k-means
  • Support vector machines
  • Artificial neural networks
  • Agriculture