Skip to main content

Advertisement

Log in

Modeling the risk of water pollution by pesticides from imbalanced data

  • Research Article
  • Published:
Environmental Science and Pollution Research Aims and scope Submit manuscript

Abstract

The pollution of ground and surface waters with pesticides is a serious ecological issue that requires adequate treatment. Most of the existing water pollution models are mechanistic mathematical models. While they have made a significant contribution to understanding the transfer processes, they face the problem of validation because of their complexity, the user subjectivity in their parameterization, and the lack of empirical data for validation. In addition, the data describing water pollution with pesticides are, in most cases, very imbalanced. This is due to strict regulations for pesticide applications, which lead to only a few pollution events. In this study, we propose the use of data mining to build models for assessing the risk of water pollution by pesticides in field-drained outflow water. Unlike the mechanistic models, the models generated by data mining are based on easily obtainable empirical data, while the parameterization of the models is not influenced by the subjectivity of ecological modelers. We used empirical data from field trials at the La Jaillière experimental site in France and applied the random forests algorithm to build predictive models that predict “risky” and “not-risky” pesticide application events. To address the problems of the imbalanced classes in the data, cost-sensitive learning and different measures of predictive performance were used. Despite the high imbalance between risky and not-risky application events, we managed to build predictive models that make reliable predictions. The proposed modeling approach can be easily applied to other ecological modeling problems where we encounter empirical data with highly imbalanced classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  • AGRESTE, Agricultural census (2010) Ministry of agriculture, food and forestry, France

  • ARVALIS, Institut du végétal (2010) EOLE, French climatic database, Boigneville, France

  • Bakhsh A, Ma L, Ahuja L, Hatfield J, Kanwar R (2004) Using RZWQM to predict herbicide leaching losses in subsurface drainage water. Trans ASAE 47(5):1415–1426

    Article  CAS  Google Scholar 

  • Bera P, Prasher S, Madani A, Gaynor J, Tan C, Patel R, Kim S (2005) Development and field validation of the PESTFATE model in southern Ontario. Trans ASAE 48(1):85–100

    Article  CAS  Google Scholar 

  • Boesten J (2000) Modeller subjectivity in estimating pesticide parameters for leaching models using the same laboratory data set. Agric Water Manag 44:389–409

    Article  Google Scholar 

  • Boivin A, Poulsen V (2017) Environmental risk assessment of pesticide: state of the art and prospective improvement from science. Environ Sci Pollut Res 24(8):6889–6894

    Article  CAS  Google Scholar 

  • Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv 49(2):31–50

    Article  Google Scholar 

  • Branger F, Debionne S, Viallet P, Braud I, Vauclin M (2006) Using the LIQUID framework to build an agricultural subsurface drainage model. Proceedings of the 7th International Conference on Hydroinformatics, Nice, pp 2024–2031

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  • Commission Implementing Regulation (EU) No 540/2011 of 25 May 2011 implementing Regulation (EC) No 1107/2009 of the European Parliament and of the Council as regards the list of approved active substances Text with EEA relevance (2011) Off J Eur Union 153:1–186

  • Damalas CA, Eleftherohorinos IG (2011) Pesticide exposure, safety issues, and risk assessment indicators. Int J Environ Res Public Health 8(5):1402–1419

    Article  CAS  Google Scholar 

  • Davies J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning (ICML’06), ACM, New York, NY, pp 233–240

  • Debeljak M, Cortet J, Demšar D, Krogh PH, Džeroski S (2007) Hierarchical classification of environmental factors and agricultural practices affecting soil fauna under cropping systems using Bt maize. Pedobiologia 51:229–238

    Article  CAS  Google Scholar 

  • Debeljak M, Squire G, Demšar D, Young MW, Džeroski S (2008) Relations between the oilseed rape volunteer seedbank, and soil factors, weed functional groups and geographical location in the UK. Ecol Model 212:138–146

    Article  Google Scholar 

  • Dubus I, Beulke S, Brown C (2002) Calibration of pesticide leaching models: critical review and guidance for reporting. Pest Manag Sci 58(8):745–758

    Article  CAS  Google Scholar 

  • Dubus I, Brown C, Beulke S (2003) Sources of uncertainty in pesticide fate modelling. Sci Total Environ 317(1–3):53–72

    Article  CAS  Google Scholar 

  • Dust M, Baran N, Errera G, Hutson J, Mouvet C, Schafer H, Vereecken H, Walker A (2000) Simulation of water and solute transport in field soils with the LEACHP model. Agric Water Manag 44(1–3):225–245

    Article  Google Scholar 

  • Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874

    Article  Google Scholar 

  • Finizio A, Villa S (2002) Environmental risk assessment for pesticides: A tool for decision making. Environ Impact Assess 22(3):235–248

    Article  Google Scholar 

  • Fleiss JL (1981) Statistical methods for rates and proportions, 2nd edn. Wiley, New York

    Google Scholar 

  • FOCUS WG (2001) FOCUS: surface water scenarios in the EU evaluation process under 91/414/EEC. Technical report SANCO/4802/2001-rev.2. European Commission

  • Kalita P, Ward A, Kanwar R, McCool D (1998) Simulation of pesticide concentrations in groundwater using Agricultural Drainage and Pesticide Transport (ADAPT) model. Agric Water Manag 36(1):23–44

    Article  Google Scholar 

  • Kumar A, Kanwar R (1997) Incorporating preferential flow and herbicide fate and transport into the DRAINAGE model. Trans ASAE 40(4):977–985

    Article  Google Scholar 

  • Kuzmanovski V, Trajanov A, Leprince F, Džeroski S, Debeljak M (2015) Modeling water outflow from tile-drained agricultural fields. Sci Total Environ 505:390–401

    Article  CAS  Google Scholar 

  • Lammoglia S, Kennedy MC, Barriuso E (2017) Assessing human health risks from pesticide use in conventional and innovative cropping systems with the BROWSE model. Environ Int 105:66–78

    Article  CAS  Google Scholar 

  • Landis RJ, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174

    Article  CAS  Google Scholar 

  • Larsbo M, Jarvis N (2003) MACRO 5.0: a model for water flow and solute transport in macroporous soil. Tech. rep., Swedish University of Agricultural Sciences

  • Larsbo M, Jarvis N (2005) Simulating solute transport in a structured field soil: Uncertainty in parameter identification and predictions. J Environ Qual 34:621–634

    Article  CAS  Google Scholar 

  • Leonard R, Knisel W, Davis F (1995) Modelling pesticide fate with GLEAMS. Eur J Agron 4(4):485–490

    Article  CAS  Google Scholar 

  • Lewis KA, Tzilivakis J, Warner D, Green A (2016) An international database for pesticide risk assessments and management. Hum Ecol Risk Assess 22(4):1050–1064

    Article  CAS  Google Scholar 

  • Ling CX, Sheng VS (2017) Class imbalance problem. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Boston

    Google Scholar 

  • Lomax S, Vadera S (2011) A survey of cost-sensitive decision tree induction algorithms. ACM Comput Surv 45(2):16:1–16:35

    Google Scholar 

  • Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta Protein Struct Mol 405(2):442–451

    Article  CAS  Google Scholar 

  • Pierlot F, Perreau MJ, Réal B, Carluer N (2017) Predictive quality of 26 pesticide risk indicators and one flow model: a multisite assessment for water contamination. Sci Total Environ 605–606:655–665

    Article  CAS  Google Scholar 

  • Polikar R (2006) Ensemble based systems in decision making. IEEE Circ Syst Mag 6(3):21–45

    Article  Google Scholar 

  • Powers DMW (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J Mach Learn Technol 2(1):37–63

    Google Scholar 

  • Regulation (EC) No 1107/2009 of the European Parliament and of the Council of 21 October 2009 concerning the placing of plant protection products on the market (2009) Off J L 309, 24.11.2009, pp. 1–50

  • Reichenberger S, Bach M, Skitschak A, Frede HG (2007) Mitigation strategies to reduce pesticide inputs into ground- and surface water and their effectiveness: a review. Sci Total Environ 384:1–35

    Article  CAS  Google Scholar 

  • Reus J, Leendertse P, Bockstaller C, Fomsgaard I, Gutsche V, Lewis K, Nilsson C, Pussemier L, Trevisan M, van der Werf H, Alfarroba F, Blümel S, Isart J, McGrath D, Seppälä T (2002) Comparison and evaluation of eight pesticide environmental risk indicators developed in Europe and recommendations for future use. Agric Ecosyst Environ 90(2):177–187

    Article  Google Scholar 

  • Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39

    Article  Google Scholar 

  • Rudra R, Negi S, Gupta N (2005) Modelling approaches for subsurface drainage water quality management. Water Qual Res J Can 40(1):71–81

    Article  CAS  Google Scholar 

  • Skaggs R (1999) Drainage simulation models. In: Skaggs R, van Schilfgaarde J (eds) Agricultural drainage. ASA-CSSASSSA, pp 469–500

  • Sylvain A, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79

    Article  Google Scholar 

  • Trajanov A (2011) Machine learning in agroecology: from simulation models to co-existence rules. Lambert Academic Publishing (LAP), Saarbrücken

    Google Scholar 

  • Trajanov A, Kuzmanovski V, Leprince F, Real B, Dutertre A, Maillet-Mezeray J, Džeroski S, Debeljak M (2015) Estimating drainage periods for agricultural fields from measured data: data mining methodology and a case study (La Jaillière). Irrig Drain 64(5):703–716

    Article  Google Scholar 

  • van den Berg F, Tiktak A, Boesten J, van der Linden A (2016) PEARL model for pesticide behaviour and emissions in soil-plant systems. Description of processes. WOt-technical report 61, Wageningen

  • Vanclooster M, Boesten J, Trevisan M, Brown C, Capri E, Eklo O, Gottesburen B, Gouy V, Van der Linden A (2000) A European test of pesticide-leaching models: methodology and major recommendations. Agric Water Manag 44(1–3):1–19

    Article  Google Scholar 

  • Vasileiadis VP, Dachbrodt-Saaydeh S, Kudsk P, Colnenne-David C, Leprince F, Holb IJ, Kierzek R, Furlan L, Loddo D, Melander B, Jørgensen LN, Newton AC, Toque C, van Dijk W, Lefebvre M, Benezit M, Sattin M (2017) Sustainability of European winter wheat- and maize-based cropping systems: economic, environmental and social ex-post assessment of conventional and IPM-based systems. Crop Prot 97:60–69

    Article  Google Scholar 

  • Witten IH, Frank E (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Burlington

    Google Scholar 

Download references

Acknowledgements

The authors wish to acknowledge the support of the project EVADIFF (Evaluation of existing models and development of new decision-making tools to prevent diffuse pollution of water caused by phytopharmaceutical products) financed by ARVALIS, Institut du végétal, France.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aneta Trajanov.

Additional information

Responsible editor: Marcus Schulz

Electronic supplementary material

ESM 1

(PDF 47 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Trajanov, A., Kuzmanovski, V., Real, B. et al. Modeling the risk of water pollution by pesticides from imbalanced data. Environ Sci Pollut Res 25, 18781–18792 (2018). https://doi.org/10.1007/s11356-018-2099-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11356-018-2099-7

Keywords