Abstract
Influent flow of wastewater treatment plants (WWTPs) is a crucial variable for plant operation and management. In this study, a random forest (RF) model was applied for daily wastewater inflow prediction, and a new probabilistic prediction approach was, for the first time, applied for quantifying the uncertainties associated with wastewater inflow prediction. The RF model uses regression trees to capture the nonlinear relationship between wastewater inflow and various influencing factors, such as weather features and domestic water usage patterns. The proposed model was applied to the daily wastewater inflow prediction for two WWTPs (i.e., Humber and one confidential plant) in Ontario, Canada. For the confidential WWTP, the coefficient of determination (\(\varvec{R}^{2}\)) values for training and testing were 0.971 and 0.722, respectively. The \(\varvec{R}^{2}\) values at the Humber WWTP were 0.957 and 0.584 for training and testing, respectively. In comparison with other approaches such as the multilayer perceptron neural networks (MLP) models and autoregressive integrated moving average models, the results show that the RF model performs well on predicting inflow. In addition, probabilistic prediction of daily inflow was generated. For the Humber station, 93.56% of the total testing samples fall into its corresponding predicted interval. For the confidential plant, 78 observed values of the total 89 samples fall into its corresponding interval, accounting for 87.64% of the total testing samples. The results show that the probabilistic approach can provide robust decision support for the operation, management, and optimization of WWTPs.
This is a preview of subscription content, access via your institution.







References
Abdel-Rahman EM, Ahmed FB, Ismail R (2013) Random forest regression and spectral band selection for estimating sugarcane leaf nitrogen concentration using EO-1 hyperion hyperspectral data. Int J Remote Sens 34(2):712–728
Abunama T, Othman F (2017) Time series analysis and forecasting of wastewater inflow into Bandar Tun Razak Sewage Treatment Plant in Selangor, Malaysia. In: IOP conference series: materials science and engineering, vol 210(1)
Amatya DM, Skaggs RW, Gregory JD (1997) Evaluation of a watershed scale forest hydrologic model. Agric Water Manag 32(3):239–258
Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7):1545–1588
Bennett ND, Croke BFW, Guariso G, Guillaume JHA, Hamilton SH, Jakeman AJ, Marsili-Libelli S, Newham LTH, Norton JP, Perrin C, Pierce SA, Robson B, Seppelt R, Voinov AA, Fath BD, Andreassian V (2013) Characterising performance of environmental models. Environ Model Softw 40:1–20
Boyd G, Na D, Li Z, Snowling S, Zhang Q, Zhou P (2019) Influent forecasting for wastewater treatment plants in North America. Sustainability 11(6):1764
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L (2002) Manual on setting up, using, and understanding random forests v3.1. http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf
Büyükalaca O, Bulut H, Yılmaz T (2001) Analysis of variable-base heating and cooling degree-days for Turkey. Appl Energy 69(4):269–283
Campisano A, Cabot Ple J, Muschalla D, Pleau M, Vanrolleghem PA (2013) Potential and limitations of modern equipment for real time control of urban wastewater systems. Urban Water J 10(5):300–311
Dai B, Gu C, Zhao E, Qin X (2018) Statistical model optimized random forest regression model for concrete dam deformation monitoring. Struct Control Health Monit 25(6):1–15
Díaz-Uriarte R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7(1):3
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and ranomization. Mach Learn 40(2):139–157
Djebbar Y, Kadora PT (1998) Estimating sanitary flows using neural networks. Water Sci Technol 38(10):215–222
Dunsmore IR (1968) A bayesian approach to calibration. J R Stat Soc 30(2):396–405
Dürrenmatt DJÔ, Gujer W (2012) Data-driven modeling approaches to support wastewater treatment plant operation. Environ Model Softw 30:47–56
El-Din AG, Smith DW (2002) A neural network model to predict the wastewater inflow incorporating rainfall events. Water Res 36(5):1115–1126
Fabian P, Gael V, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294–300
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102(477):359–378
González PA, Zamarreño JM (2005) Prediction of hourly energy consumption in buildings based on a feedback artificial neural network. Energy Build 37(6):595–601
Grömping U (2009) Variable importance assessment in regression: linear regression versus random forest. Am Stat 63(4):308–319
Gupta HV, Kling H, Yilmaz KK, Martinez GF (2009) Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J Hydrol 377(1–2):80–91
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Jain SK, Sudheer KP (2008) Fitting of hydrologic models: a close look at the Nash–Sutcliffe index. J Hydrol Eng 13(10):981–986
Jothiprakash V, Kote AS (2011) Improving the performance of data-driven techniques through data pre-processing for modelling daily reservoir inflow. Hydrol Sci J 56(1):168–186
Kim JR, Ko JH, Im JH, Lee SH, Kim SH, Kim CW, Park TJ (2006) Forecasting influent flow rate and composition with occasional data for supervisory management system by time series model. Water Sci Technol 53(4–5):185–192
Kim M, Kim Y, Kim H, Piao W, Kim C (2016) Evaluation of the k-nearest neighbor method for forecasting the influent characteristics of wastewater treatment plant. Front Environ Sci Eng 10(2):299–310
Li Z, Huang G, Han J, Wang X, Fan Y, Cheng G, Zhang H, Huang W (2015) Development of a stepwise-clustered hydrological inference model. J Hydrol Eng 20(10):04015008
Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(3):18–22
Loh WY (2014) Classification and regression tree methods. Wiley StatsRef: Statistics Reference Online
Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7:983–999
Mello CR, Viola MR, Norton LD, Silva AM, Weimar FA (2008) Development and application of a simple hydrologic model simulation for a Brazilian headwater basin. CATENA 75(3):235–247
Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans ASABE 50(3):885–900
Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I—a discussion of principles. J Hydrol 10(3):282–290
Olmedo MTC, Paegelow M, Mas JF, and Escobar F (eds) (2018) Geomatic approaches for modeling land change scenarios. Springer, Switzerland
Pagano TC, Garen DC, Perkins TR, Pasteris PA (2009) Daily updating of operational statistical seasonal water supply forecasts for the Western U.S. J Am Water Resour Assoc 45(3):767–778
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217–222
Papacharalampous G, Tyralis H, Koutsoyiannis D (2018) One-step ahead forecasting of geophysical processes within a purely statistical framework. Geosci Lett 5(1):1–19
Papacharalampous G, Tyralis H, Koutsoyiannis D (2019) Comparison of stochastic and machine learning methods for multi-step ahead forecasting of hydrological processes. Stochast Environ Res Risk Assess 32(2):481–514
Ponomarenko A, Avrelin N, Naidan B, Boytsov L (2014) Comparative analysis of data structures for approximate nearest neighbor search. In: Data analytics, pp 125–130
Probst P, Boulesteix A-L (2018) To tune or not to tune the number of trees in random forest? J Mach Learn Res 18:1–18
Singh RP, Gao PX, Lizotte DJ (2012) On hourly home peak load prediction. In: 2012 IEEE 3rd international conference on smart grid communications, SmartGridComm 2012. IEEE, pp 163–166
Szelag B, Bartkiewicz L, Studziński J, Barbusiński K (2017) Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear. Arch Environ Protect 43(3):74–81
Tehrany MS, Pradhan B, Jebur MN (2013) Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J Hydrol 504:69–79
Tiwari MK, Chatterjee C (2011) A new wavelet–bootstrap–ANN hybrid model for daily discharge forecasting. J Hydroinform 13(3):500–519
Tyralis H, Papacharalampous G (2017) Variable selection in time series forecasting using random forests. Algorithms 10(4):114
Tyralis H, Papacharalampous G, Langousis A (2019a) A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water 11(5):910
Tyralis H, Papacharalampous G, Tantanee S (2019b) How to explain and predict the shape parameter of the generalized extreme value distribution of streamflow extremes using a big dataset. J Hydrol 574:628–645
Wang Z, Lai C, Chen X, Yang B, Zhao S, Bai X (2015) Flood hazard risk assessment model based on random forest. J Hydrol 527:1130–1141
Wei X, Kusiak A (2015) Short-term prediction of influent flow in wastewater treatment plant. Stoch Env Res Risk Assess 29(1):241–249
Wei X, Kusiak A, Sadat HR (2013) Prediction of influent flow rate: data-mining approach. J Energy Eng 139:118–123
Winkler RL (1972) A decision-theoretic approach to interval estimation. J Am Stat Assoc 67(337):187–191
Yeh AG, Li X (2002) Urban simulation using neural networks and cellular automata for land use planning. In: Advances in spatial data handling. pp 451–464.
Zahedi P, Parvandeh S, Asgharpour A, McLaury BS, Shirazi SA, McKinney BA (2018) Random forest regression prediction of solid particle Erosion in elbows. Powder Technol 338:983–992
Zhang D, Martinez N, Lindholm G, Ratnaweera H (2018) Manage sewer in-line storage control using hydraulic model and recurrent neural network. Water Resour Manag 32(6):2079–2098
Zhou Z, Wen C, Yang C (2015) Fault detection using random projections and k-nearest neighbor rule for semiconductor manufacturing processes. IEEE Trans Semicond Manuf 28(1):70–79
Acknowledgements
This study was supported by the Southern Ontario Water Consortium and the Natural Science and Engineering Research Council of Canada. The authors would like to thank the contributions of engineers at Hydromantis and the WWTPs for their suggestions and comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhou, P., Li, Z., Snowling, S. et al. A random forest model for inflow prediction at wastewater treatment plants. Stoch Environ Res Risk Assess 33, 1781–1792 (2019). https://doi.org/10.1007/s00477-019-01732-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-019-01732-9
Keywords
- Random forest
- WWTP
- Wastewater prediction
- Daily flow
- Uncertainty analysis