Abstract
The pollution of ground and surface waters with pesticides is a serious ecological issue that requires adequate treatment. Most of the existing water pollution models are mechanistic mathematical models. While they have made a significant contribution to understanding the transfer processes, they face the problem of validation because of their complexity, the user subjectivity in their parameterization, and the lack of empirical data for validation. In addition, the data describing water pollution with pesticides are, in most cases, very imbalanced. This is due to strict regulations for pesticide applications, which lead to only a few pollution events. In this study, we propose the use of data mining to build models for assessing the risk of water pollution by pesticides in field-drained outflow water. Unlike the mechanistic models, the models generated by data mining are based on easily obtainable empirical data, while the parameterization of the models is not influenced by the subjectivity of ecological modelers. We used empirical data from field trials at the La Jaillière experimental site in France and applied the random forests algorithm to build predictive models that predict “risky” and “not-risky” pesticide application events. To address the problems of the imbalanced classes in the data, cost-sensitive learning and different measures of predictive performance were used. Despite the high imbalance between risky and not-risky application events, we managed to build predictive models that make reliable predictions. The proposed modeling approach can be easily applied to other ecological modeling problems where we encounter empirical data with highly imbalanced classes.








Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
AGRESTE, Agricultural census (2010) Ministry of agriculture, food and forestry, France
ARVALIS, Institut du végétal (2010) EOLE, French climatic database, Boigneville, France
Bakhsh A, Ma L, Ahuja L, Hatfield J, Kanwar R (2004) Using RZWQM to predict herbicide leaching losses in subsurface drainage water. Trans ASAE 47(5):1415–1426
Bera P, Prasher S, Madani A, Gaynor J, Tan C, Patel R, Kim S (2005) Development and field validation of the PESTFATE model in southern Ontario. Trans ASAE 48(1):85–100
Boesten J (2000) Modeller subjectivity in estimating pesticide parameters for leaching models using the same laboratory data set. Agric Water Manag 44:389–409
Boivin A, Poulsen V (2017) Environmental risk assessment of pesticide: state of the art and prospective improvement from science. Environ Sci Pollut Res 24(8):6889–6894
Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv 49(2):31–50
Branger F, Debionne S, Viallet P, Braud I, Vauclin M (2006) Using the LIQUID framework to build an agricultural subsurface drainage model. Proceedings of the 7th International Conference on Hydroinformatics, Nice, pp 2024–2031
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Commission Implementing Regulation (EU) No 540/2011 of 25 May 2011 implementing Regulation (EC) No 1107/2009 of the European Parliament and of the Council as regards the list of approved active substances Text with EEA relevance (2011) Off J Eur Union 153:1–186
Damalas CA, Eleftherohorinos IG (2011) Pesticide exposure, safety issues, and risk assessment indicators. Int J Environ Res Public Health 8(5):1402–1419
Davies J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning (ICML’06), ACM, New York, NY, pp 233–240
Debeljak M, Cortet J, Demšar D, Krogh PH, Džeroski S (2007) Hierarchical classification of environmental factors and agricultural practices affecting soil fauna under cropping systems using Bt maize. Pedobiologia 51:229–238
Debeljak M, Squire G, Demšar D, Young MW, Džeroski S (2008) Relations between the oilseed rape volunteer seedbank, and soil factors, weed functional groups and geographical location in the UK. Ecol Model 212:138–146
Dubus I, Beulke S, Brown C (2002) Calibration of pesticide leaching models: critical review and guidance for reporting. Pest Manag Sci 58(8):745–758
Dubus I, Brown C, Beulke S (2003) Sources of uncertainty in pesticide fate modelling. Sci Total Environ 317(1–3):53–72
Dust M, Baran N, Errera G, Hutson J, Mouvet C, Schafer H, Vereecken H, Walker A (2000) Simulation of water and solute transport in field soils with the LEACHP model. Agric Water Manag 44(1–3):225–245
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874
Finizio A, Villa S (2002) Environmental risk assessment for pesticides: A tool for decision making. Environ Impact Assess 22(3):235–248
Fleiss JL (1981) Statistical methods for rates and proportions, 2nd edn. Wiley, New York
FOCUS WG (2001) FOCUS: surface water scenarios in the EU evaluation process under 91/414/EEC. Technical report SANCO/4802/2001-rev.2. European Commission
Kalita P, Ward A, Kanwar R, McCool D (1998) Simulation of pesticide concentrations in groundwater using Agricultural Drainage and Pesticide Transport (ADAPT) model. Agric Water Manag 36(1):23–44
Kumar A, Kanwar R (1997) Incorporating preferential flow and herbicide fate and transport into the DRAINAGE model. Trans ASAE 40(4):977–985
Kuzmanovski V, Trajanov A, Leprince F, Džeroski S, Debeljak M (2015) Modeling water outflow from tile-drained agricultural fields. Sci Total Environ 505:390–401
Lammoglia S, Kennedy MC, Barriuso E (2017) Assessing human health risks from pesticide use in conventional and innovative cropping systems with the BROWSE model. Environ Int 105:66–78
Landis RJ, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Larsbo M, Jarvis N (2003) MACRO 5.0: a model for water flow and solute transport in macroporous soil. Tech. rep., Swedish University of Agricultural Sciences
Larsbo M, Jarvis N (2005) Simulating solute transport in a structured field soil: Uncertainty in parameter identification and predictions. J Environ Qual 34:621–634
Leonard R, Knisel W, Davis F (1995) Modelling pesticide fate with GLEAMS. Eur J Agron 4(4):485–490
Lewis KA, Tzilivakis J, Warner D, Green A (2016) An international database for pesticide risk assessments and management. Hum Ecol Risk Assess 22(4):1050–1064
Ling CX, Sheng VS (2017) Class imbalance problem. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Boston
Lomax S, Vadera S (2011) A survey of cost-sensitive decision tree induction algorithms. ACM Comput Surv 45(2):16:1–16:35
Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta Protein Struct Mol 405(2):442–451
Pierlot F, Perreau MJ, Réal B, Carluer N (2017) Predictive quality of 26 pesticide risk indicators and one flow model: a multisite assessment for water contamination. Sci Total Environ 605–606:655–665
Polikar R (2006) Ensemble based systems in decision making. IEEE Circ Syst Mag 6(3):21–45
Powers DMW (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J Mach Learn Technol 2(1):37–63
Regulation (EC) No 1107/2009 of the European Parliament and of the Council of 21 October 2009 concerning the placing of plant protection products on the market (2009) Off J L 309, 24.11.2009, pp. 1–50
Reichenberger S, Bach M, Skitschak A, Frede HG (2007) Mitigation strategies to reduce pesticide inputs into ground- and surface water and their effectiveness: a review. Sci Total Environ 384:1–35
Reus J, Leendertse P, Bockstaller C, Fomsgaard I, Gutsche V, Lewis K, Nilsson C, Pussemier L, Trevisan M, van der Werf H, Alfarroba F, Blümel S, Isart J, McGrath D, Seppälä T (2002) Comparison and evaluation of eight pesticide environmental risk indicators developed in Europe and recommendations for future use. Agric Ecosyst Environ 90(2):177–187
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39
Rudra R, Negi S, Gupta N (2005) Modelling approaches for subsurface drainage water quality management. Water Qual Res J Can 40(1):71–81
Skaggs R (1999) Drainage simulation models. In: Skaggs R, van Schilfgaarde J (eds) Agricultural drainage. ASA-CSSASSSA, pp 469–500
Sylvain A, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Trajanov A (2011) Machine learning in agroecology: from simulation models to co-existence rules. Lambert Academic Publishing (LAP), Saarbrücken
Trajanov A, Kuzmanovski V, Leprince F, Real B, Dutertre A, Maillet-Mezeray J, Džeroski S, Debeljak M (2015) Estimating drainage periods for agricultural fields from measured data: data mining methodology and a case study (La Jaillière). Irrig Drain 64(5):703–716
van den Berg F, Tiktak A, Boesten J, van der Linden A (2016) PEARL model for pesticide behaviour and emissions in soil-plant systems. Description of processes. WOt-technical report 61, Wageningen
Vanclooster M, Boesten J, Trevisan M, Brown C, Capri E, Eklo O, Gottesburen B, Gouy V, Van der Linden A (2000) A European test of pesticide-leaching models: methodology and major recommendations. Agric Water Manag 44(1–3):1–19
Vasileiadis VP, Dachbrodt-Saaydeh S, Kudsk P, Colnenne-David C, Leprince F, Holb IJ, Kierzek R, Furlan L, Loddo D, Melander B, Jørgensen LN, Newton AC, Toque C, van Dijk W, Lefebvre M, Benezit M, Sattin M (2017) Sustainability of European winter wheat- and maize-based cropping systems: economic, environmental and social ex-post assessment of conventional and IPM-based systems. Crop Prot 97:60–69
Witten IH, Frank E (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Burlington
Acknowledgements
The authors wish to acknowledge the support of the project EVADIFF (Evaluation of existing models and development of new decision-making tools to prevent diffuse pollution of water caused by phytopharmaceutical products) financed by ARVALIS, Institut du végétal, France.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Marcus Schulz
Electronic supplementary material
ESM 1
(PDF 47 kb)
Rights and permissions
About this article
Cite this article
Trajanov, A., Kuzmanovski, V., Real, B. et al. Modeling the risk of water pollution by pesticides from imbalanced data. Environ Sci Pollut Res 25, 18781–18792 (2018). https://doi.org/10.1007/s11356-018-2099-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11356-018-2099-7


