Advertisement

Environmental Science and Pollution Research

, Volume 26, Issue 2, pp 1821–1833 | Cite as

Evaluation of the bias and precision of regression techniques and machine learning approaches in total dissolved solids modeling of an urban aquifer

  • Conglian Pan
  • Kelvin Tsun Wai NgEmail author
  • Bahareh Fallah
  • Amy Richter
Research Article
  • 48 Downloads

Abstract

TDS is modeled for an aquifer near an unlined landfill in Canada. Canadian Drinking Water Guidelines and other indices are used to evaluate TDS concentrations in 27 monitoring wells surrounding the landfill. This study aims to predict TDS concentrations using three different modeling approaches: dual-step multiple linear regression (MLR), hybrid principal component regression (PCR), and backpropagation neural networks (BPNN). An analysis of the bias and precision of each models follows, using performance evaluation metrics and statistical indices. TDS is one of the most important parameters in assessing suitability of water for irrigation, and for overall groundwater quality assessment. Good agreement was observed between the MLR1 model and field data, although multicollinearity issues exist. Percentage errors of hybrid PCR were comparable to the dual-step MLR method. Percentage error for hybrid PCR was found to be inversely proportional to TDS concentrations, which was not observed for dual-step MLR. Larger errors were obtained from the BPNN models, and higher percentage errors were observed in monitoring wells with lower TDS concentrations. All models in this study adequately describe the data in testing stage (R2 > 0.86). Generally, the dual-step MLR and hybrid PCR models fared better (R2avg = 0.981 and 0.974, respectively), while BPNN models performed worse (R2avg = 0.904). For this dataset, both regression and machine learning models are more suited to predict mid-range data compared to extreme values. Advanced regression methods (hybrid PCR and dual-step MLR) are more advantageous compared to BPNN.

Keywords

Total dissolved solids Artificial neural network Principal component regression Multivariate statistical analysis Machine learning methods Bias and precision 

Notes

Funding information

The research reported in this paper was supported by a grant from the Natural Sciences and Engineering Research Council of Canada (RGPIN-385815). The authors are grateful for their support. The views expressed herein are those of the writers and not necessarily those of our research and funding partners.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. Abou Zakhem B, Al-Charideh A, Kattaa B (2017) Using principal component analysis in the investigation of groundwater hydrochemistry of Upper Jezireh Basin, Syria. Hydrol Sci J 62(14):2266–2279.  https://doi.org/10.1080/02626667.2017.1364845 CrossRefGoogle Scholar
  2. Atta HSAF, Amer AWM, Atta SAF (2018) Hydro-chemical study of groundwater and its suitability for different purposes at Manfalut District, Assuit Governate. Water Science 32(1):1–15CrossRefGoogle Scholar
  3. Azadi S, Karimi-Jashni A (2016) Verifying the performance of artificial neural network and multiple linear regression in predicting the mean seasonal municipal solid waste generation rate: a case study of Fars province, Iran. Waste Manag 48:14–23.  https://doi.org/10.1016/j.wasman.2015.09.034 CrossRefGoogle Scholar
  4. Azadi S, Amiri H, Rakhshandehroo GR (2016) Evaluating the ability of artificial neural network and PCA-M5P models in predicting leachate COD load in landfills. Waste Manag 55:220–230.  https://doi.org/10.1016/j.wasman.2016.05.025 CrossRefGoogle Scholar
  5. Bagheri M, Bazvand A, Ehteshami M (2017) Application of artificial intelligence for the management of landfill leachate penetration into groundwater, and assessment of its environmental impacts. J Clean Prod 149:784–796.  https://doi.org/10.1016/j.jclepro.2017.02.157 CrossRefGoogle Scholar
  6. Bingham NH, Fry JM (2010) Regression: linear models in statistics. Springer Science & Business Media.  https://doi.org/10.1007/978-1-84882-969-5
  7. Bruce N, Ng KTW, Vu HL (2018) Use of seasonal parameters and their effects on FOD landfill gas modeling. Environ Monit Assess 190:291.  https://doi.org/10.1007/s10661-018-6663-x CrossRefGoogle Scholar
  8. Cattell RB, Jaspers J (1967) A general plasmode (No. 30-10-5-2) for factor analytic exercises and research. Multivar Behav Res Monogr 67-3:211Google Scholar
  9. Chen CS, Chen BPT, Chou FNF, Yang CC (2010) Development and application of a decision group back-propagation neural network for flood forecasting. J Hydrol 385(1–4):173–182.  https://doi.org/10.1016/j.jhydrol.2010.02.019 CrossRefGoogle Scholar
  10. Chickering GW, Krause MJ, Townsend TG (2018) Determination of as-discarded methane potential in residential and commercial municipal solid waste. Waste Manag 76:82–89.  https://doi.org/10.1016/j.wasman.2018.03.017 CrossRefGoogle Scholar
  11. City of Regina (2002) State of the environment report 2000, Regina Urban Environment Advisory, May 2002Google Scholar
  12. City of Regina (2009) City of Regina landfill groundwater monitoring report. City of Regina environmental services. Regina, SKGoogle Scholar
  13. City of Regina (2011) City of Regina landfill groundwater monitoring report. City of Regina environmental services. Regina, SKGoogle Scholar
  14. City of Regina (2012) City of Regina landfill groundwater monitoring report. City of Regina environmental services. Regina, SKGoogle Scholar
  15. City of Regina (2013) City of Regina landfill groundwater monitoring report. City of Regina environmental services. Regina, SKGoogle Scholar
  16. City of Regina (2014) City of Regina landfill groundwater monitoring report. City of Regina environmental services. Regina, SKGoogle Scholar
  17. City of Regina (2015) City of Regina landfill groundwater monitoring report. City of Regina environmental services. Regina, SKGoogle Scholar
  18. City of Regina (2016) City of Regina landfill groundwater monitoring report. City of Regina environmental services. Regina, SKGoogle Scholar
  19. Civelekoglu G, Yigit NO, Diamadopoulos E, Kitis M (2007) Prediction of bromate formation using multi-linear regression and artificial neural networks. Ozone Sci Eng 29(5):353–362.  https://doi.org/10.1080/01919510701549327 CrossRefGoogle Scholar
  20. Ebrahimi H, Rajaee T (2017) Simulation of groundwater level variations using wavelet combined with neural network, linear regression and support vector machine. Glob Planet Chang 148:181–191.  https://doi.org/10.1016/j.gloplacha.2016.11.014 CrossRefGoogle Scholar
  21. Han Z, Liu Y, Zhong M, Shi G, Li Q, Zeng D, Zhang Y, Fei Y, Xie Y (2018) Influencing factors of domestic waste characteristics in rural areas of developing countries. Waste Manag 72:45–54.  https://doi.org/10.1016/j.wasman.2017.11.039 CrossRefGoogle Scholar
  22. Hanley JA (2016) Simple and multiple linear regression: sample size considerations. J Clin Epidemiol 79:112–119.  https://doi.org/10.1016/j.jclinepi.2016.05.014 CrossRefGoogle Scholar
  23. Hassen I, Hamzaoui-Azaza F, Bouhlila R (2016) Application of multivariate statistical analysis and hydrochemical and isotopic investigations for evaluation of groundwater quality and its suitability for drinking and agriculture purposes: case of Oum Ali-Thelepte aquifer, Central Tunisia. Environ Monit Assess 188(3):135.  https://doi.org/10.1007/S10661-016-5124-7 CrossRefGoogle Scholar
  24. Health Canada (2017) Guidelines for Canadian drinking water quality—summary table. Water and air quality bureau, healthy environments and consumer safety branch, Health Canada, Ottawa, OntarioGoogle Scholar
  25. Hu S, Luo T, Jing C (2013) Principal component analysis of fluoride geochemistry of groundwater in Shanxi and Inner Mongolia, China. J Geochem Explor 135:124–129.  https://doi.org/10.1016/j.gexplo.2012.08.013 CrossRefGoogle Scholar
  26. Kicsiny R (2016) Improved multiple linear regression based models for solar collectors. Renew Energy 91:224–232.  https://doi.org/10.1016/j.renene.2016.01.056 CrossRefGoogle Scholar
  27. Li Z, Wang G, Wang X, Wan L, Shi Z, Wanke H, Uugulu S, Uahengo C (2018) Groundwater quality and associated hydrogeochemical processes in Northwest Namibia. J Geochem Explor 186:202–214CrossRefGoogle Scholar
  28. Pan C, Ng KTW (2018) Multivariate analysis and Hydrochemical assessment of groundwater at the Regina landfill site. 33rd international conference on solid waste technology and management, Annapolis, Washington, MD, U.S.A.Google Scholar
  29. Pan C, Ng KTW, Richter A (2017) Hydrochemical assessment of groundwater quality near Regina municipal landfill”. Sardinia ‘17, 16th International Waste Management and Landfill Symposium, Santa Margherita di Pula, CagliariGoogle Scholar
  30. Pan C, Bolingbroke D, Ng KTW, Richter A, Vu HL (2018) “The Use of Waste Diversion Indices on the Analysis of Canadian Waste Management Models”. Journal of Material Cycles and Waste Management.  https://doi.org/10.1007/s10163-018-0809-3
  31. Pomeroy JW, de Boer D, Martz LW (2005) Hydrology and water resources of Saskatchewan (p 25). Saskatoon: Centre for Hydrology Report #1, University of SaskatchewanGoogle Scholar
  32. Rashid NA, Rosely NAM, Noor MAM, Shamsuddin A, Hamid MKA, Ibrahim KA (2017) Forecasting of refined palm oil quality using principal component regression. Energy Procedia 142:2977–2982.  https://doi.org/10.1016/j.egypro.2017.12.364 CrossRefGoogle Scholar
  33. Ravikumar P, Somashekar RK (2017) Principal component analysis and hydrochemical facies characterization to evaluate groundwater quality in Varahi river basin, Karnataka state, India. Appl Water Sci 7(2):745–755.  https://doi.org/10.1007/s13201-015-0287-x CrossRefGoogle Scholar
  34. Sahoo S, Jha MK (2013) Groundwater-level prediction using multiple linear regression and artificial neural network techniques: a comparative assessment. Hydrogeol J 21(8):1865–1887.  https://doi.org/10.1007/s10040-013-1029-5 CrossRefGoogle Scholar
  35. Saskatchewan Ministry of Environment (2016) Municipal Drinking Water Quality Monitoring Guidelines. Edition 4. Environmental and Municipal Management Services Division, Water Security Agency. Regina, Saskatchewan.Google Scholar
  36. Selvakumar S, Chandrasekar N, Kumar G (2017) Hydrogeochemical characteristics and groundwater contamination in the rapid urban development areas of Coimbatore, India. Water Resources and Industry 17:26–33.  https://doi.org/10.1016/j.wri.2017.02.002 CrossRefGoogle Scholar
  37. Sherrard JH, Moore DR, Dillaha TA (1987) Total dissolved solids: determination, sources, effects, and removal. J Environ Educ 18(2):19–24.  https://doi.org/10.1080/00958964.1987.9943484 CrossRefGoogle Scholar
  38. Solanki RB, Kulkarni HD, Singh S, Verma AK, Varde PV (2018) Optimization of regression model using principal component regression method in passive system reliability assessment. Prog Nucl Energy 103:126–134.  https://doi.org/10.1016/j.pnucene.2017.11.012 CrossRefGoogle Scholar
  39. Sousa SIV, Martins FG, Alvim-Ferraz MCM, Pereira MC (2007) Multiple linear regression and artificial neural networks based on principal components to predict ozone concentrations. Environ Model Softw 22(1):97–103.  https://doi.org/10.1016/j.envsoft.2005.12.002 CrossRefGoogle Scholar
  40. Tan KC, San Lim H, Jafri MZM (2016) Prediction of column ozone concentrations using multiple regression analysis and principal component analysis techniques: a case study in peninsular Malaysia. Atmos Pollut Res 7(3):533–546.  https://doi.org/10.1016/j.apr.2016.01.002 CrossRefGoogle Scholar
  41. Viswanath NC, Kumar PD, Ammad KK (2015) Statistical analysis of quality of water in various water shed for Kozhikode City, Kerala, India. Aquatic Procedia 4:1078–1085.  https://doi.org/10.1016/j.aqpro.2015.02.136 CrossRefGoogle Scholar
  42. Vu HL, Ng KTW, Richter A (2017) Optimization of first order decay gas generation model parameters for landfills located in cold semi-arid climates. Waste Manag 69:315–324.  https://doi.org/10.1016/j.wasman.2017.08.028 CrossRefGoogle Scholar
  43. Xu J, Wang L, Wang L, Shen X, Xu W (2011) QSPR study of Setschenow constants of organic compounds using MLR, ANN, and SVM analyses. J Comput Chem 32(15):3241–3252.  https://doi.org/10.1002/jcc.21907 CrossRefGoogle Scholar
  44. Xun Z, Hua Z, Liang Z, Ye S, Xia Y, Rui L, Li Z (2007) Some factors affecting TDS and pH values in groundwater of the Beihai coastal area in southern Guangxi, China. Environ Geol 53(2):317–323.  https://doi.org/10.1007/s00254-007-0647-4 CrossRefGoogle Scholar
  45. Zhao X, Wang S, Li T (2011) Review of evaluation criteria and main methods of wind power forecasting. Energy Procedia 12:761–769.  https://doi.org/10.1016/j.egypro.2011.10.102 CrossRefGoogle Scholar
  46. Zhao Y, Xia XH, Yang ZF, Wang F (2012) Assessment of water quality in Baiyangdian Lake using multivariate statistical techniques. Procedia Environ Sci 13:1213–1226.  https://doi.org/10.1016/j.proenv.2012.01.115 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Environmental Systems EngineeringUniversity of ReginaReginaCanada

Personalised recommendations