Skip to main content
Log in

A random forest model for inflow prediction at wastewater treatment plants

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Influent flow of wastewater treatment plants (WWTPs) is a crucial variable for plant operation and management. In this study, a random forest (RF) model was applied for daily wastewater inflow prediction, and a new probabilistic prediction approach was, for the first time, applied for quantifying the uncertainties associated with wastewater inflow prediction. The RF model uses regression trees to capture the nonlinear relationship between wastewater inflow and various influencing factors, such as weather features and domestic water usage patterns. The proposed model was applied to the daily wastewater inflow prediction for two WWTPs (i.e., Humber and one confidential plant) in Ontario, Canada. For the confidential WWTP, the coefficient of determination (\(\varvec{R}^{2}\)) values for training and testing were 0.971 and 0.722, respectively. The \(\varvec{R}^{2}\) values at the Humber WWTP were 0.957 and 0.584 for training and testing, respectively. In comparison with other approaches such as the multilayer perceptron neural networks (MLP) models and autoregressive integrated moving average models, the results show that the RF model performs well on predicting inflow. In addition, probabilistic prediction of daily inflow was generated. For the Humber station, 93.56% of the total testing samples fall into its corresponding predicted interval. For the confidential plant, 78 observed values of the total 89 samples fall into its corresponding interval, accounting for 87.64% of the total testing samples. The results show that the probabilistic approach can provide robust decision support for the operation, management, and optimization of WWTPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abdel-Rahman EM, Ahmed FB, Ismail R (2013) Random forest regression and spectral band selection for estimating sugarcane leaf nitrogen concentration using EO-1 hyperion hyperspectral data. Int J Remote Sens 34(2):712–728

    Article  Google Scholar 

  • Abunama T, Othman F (2017) Time series analysis and forecasting of wastewater inflow into Bandar Tun Razak Sewage Treatment Plant in Selangor, Malaysia. In: IOP conference series: materials science and engineering, vol 210(1)

    Article  Google Scholar 

  • Amatya DM, Skaggs RW, Gregory JD (1997) Evaluation of a watershed scale forest hydrologic model. Agric Water Manag 32(3):239–258

    Article  Google Scholar 

  • Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7):1545–1588

    Article  Google Scholar 

  • Bennett ND, Croke BFW, Guariso G, Guillaume JHA, Hamilton SH, Jakeman AJ, Marsili-Libelli S, Newham LTH, Norton JP, Perrin C, Pierce SA, Robson B, Seppelt R, Voinov AA, Fath BD, Andreassian V (2013) Characterising performance of environmental models. Environ Model Softw 40:1–20

    Article  Google Scholar 

  • Boyd G, Na D, Li Z, Snowling S, Zhang Q, Zhou P (2019) Influent forecasting for wastewater treatment plants in North America. Sustainability 11(6):1764

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Breiman L (2002) Manual on setting up, using, and understanding random forests v3.1. http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf

  • Büyükalaca O, Bulut H, Yılmaz T (2001) Analysis of variable-base heating and cooling degree-days for Turkey. Appl Energy 69(4):269–283

    Article  Google Scholar 

  • Campisano A, Cabot Ple J, Muschalla D, Pleau M, Vanrolleghem PA (2013) Potential and limitations of modern equipment for real time control of urban wastewater systems. Urban Water J 10(5):300–311

    Article  Google Scholar 

  • Dai B, Gu C, Zhao E, Qin X (2018) Statistical model optimized random forest regression model for concrete dam deformation monitoring. Struct Control Health Monit 25(6):1–15

    Article  CAS  Google Scholar 

  • Díaz-Uriarte R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7(1):3

    Article  CAS  Google Scholar 

  • Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and ranomization. Mach Learn 40(2):139–157

    Article  Google Scholar 

  • Djebbar Y, Kadora PT (1998) Estimating sanitary flows using neural networks. Water Sci Technol 38(10):215–222

    Article  Google Scholar 

  • Dunsmore IR (1968) A bayesian approach to calibration. J R Stat Soc 30(2):396–405

    Google Scholar 

  • Dürrenmatt DJÔ, Gujer W (2012) Data-driven modeling approaches to support wastewater treatment plant operation. Environ Model Softw 30:47–56

    Google Scholar 

  • El-Din AG, Smith DW (2002) A neural network model to predict the wastewater inflow incorporating rainfall events. Water Res 36(5):1115–1126

    Article  CAS  Google Scholar 

  • Fabian P, Gael V, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  • Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294–300

    Article  Google Scholar 

  • Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102(477):359–378

    Article  CAS  Google Scholar 

  • González PA, Zamarreño JM (2005) Prediction of hourly energy consumption in buildings based on a feedback artificial neural network. Energy Build 37(6):595–601

    Article  Google Scholar 

  • Grömping U (2009) Variable importance assessment in regression: linear regression versus random forest. Am Stat 63(4):308–319

    Article  Google Scholar 

  • Gupta HV, Kling H, Yilmaz KK, Martinez GF (2009) Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J Hydrol 377(1–2):80–91

    Article  Google Scholar 

  • Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  • Jain SK, Sudheer KP (2008) Fitting of hydrologic models: a close look at the Nash–Sutcliffe index. J Hydrol Eng 13(10):981–986

    Article  Google Scholar 

  • Jothiprakash V, Kote AS (2011) Improving the performance of data-driven techniques through data pre-processing for modelling daily reservoir inflow. Hydrol Sci J 56(1):168–186

    Article  Google Scholar 

  • Kim JR, Ko JH, Im JH, Lee SH, Kim SH, Kim CW, Park TJ (2006) Forecasting influent flow rate and composition with occasional data for supervisory management system by time series model. Water Sci Technol 53(4–5):185–192

    Article  CAS  Google Scholar 

  • Kim M, Kim Y, Kim H, Piao W, Kim C (2016) Evaluation of the k-nearest neighbor method for forecasting the influent characteristics of wastewater treatment plant. Front Environ Sci Eng 10(2):299–310

    Article  CAS  Google Scholar 

  • Li Z, Huang G, Han J, Wang X, Fan Y, Cheng G, Zhang H, Huang W (2015) Development of a stepwise-clustered hydrological inference model. J Hydrol Eng 20(10):04015008

    Article  Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2(3):18–22

    Google Scholar 

  • Loh WY (2014) Classification and regression tree methods. Wiley StatsRef: Statistics Reference Online

  • Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7:983–999

    Google Scholar 

  • Mello CR, Viola MR, Norton LD, Silva AM, Weimar FA (2008) Development and application of a simple hydrologic model simulation for a Brazilian headwater basin. CATENA 75(3):235–247

    Article  Google Scholar 

  • Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans ASABE 50(3):885–900

    Article  Google Scholar 

  • Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I—a discussion of principles. J Hydrol 10(3):282–290

    Article  Google Scholar 

  • Olmedo MTC, Paegelow M, Mas JF, and Escobar F (eds) (2018) Geomatic approaches for modeling land change scenarios. Springer, Switzerland

    Google Scholar 

  • Pagano TC, Garen DC, Perkins TR, Pasteris PA (2009) Daily updating of operational statistical seasonal water supply forecasts for the Western U.S. J Am Water Resour Assoc 45(3):767–778

    Article  Google Scholar 

  • Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217–222

    Article  Google Scholar 

  • Papacharalampous G, Tyralis H, Koutsoyiannis D (2018) One-step ahead forecasting of geophysical processes within a purely statistical framework. Geosci Lett 5(1):1–19

    Article  Google Scholar 

  • Papacharalampous G, Tyralis H, Koutsoyiannis D (2019) Comparison of stochastic and machine learning methods for multi-step ahead forecasting of hydrological processes. Stochast Environ Res Risk Assess 32(2):481–514

    Article  Google Scholar 

  • Ponomarenko A, Avrelin N, Naidan B, Boytsov L (2014) Comparative analysis of data structures for approximate nearest neighbor search. In: Data analytics, pp 125–130

  • Probst P, Boulesteix A-L (2018) To tune or not to tune the number of trees in random forest? J Mach Learn Res 18:1–18

    Google Scholar 

  • Singh RP, Gao PX, Lizotte DJ (2012) On hourly home peak load prediction. In: 2012 IEEE 3rd international conference on smart grid communications, SmartGridComm 2012. IEEE, pp 163–166

  • Szelag B, Bartkiewicz L, Studziński J, Barbusiński K (2017) Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear. Arch Environ Protect 43(3):74–81

    Article  Google Scholar 

  • Tehrany MS, Pradhan B, Jebur MN (2013) Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J Hydrol 504:69–79

    Article  Google Scholar 

  • Tiwari MK, Chatterjee C (2011) A new wavelet–bootstrap–ANN hybrid model for daily discharge forecasting. J Hydroinform 13(3):500–519

    Article  Google Scholar 

  • Tyralis H, Papacharalampous G (2017) Variable selection in time series forecasting using random forests. Algorithms 10(4):114

    Article  Google Scholar 

  • Tyralis H, Papacharalampous G, Langousis A (2019a) A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water 11(5):910

    Article  Google Scholar 

  • Tyralis H, Papacharalampous G, Tantanee S (2019b) How to explain and predict the shape parameter of the generalized extreme value distribution of streamflow extremes using a big dataset. J Hydrol 574:628–645

    Article  Google Scholar 

  • Wang Z, Lai C, Chen X, Yang B, Zhao S, Bai X (2015) Flood hazard risk assessment model based on random forest. J Hydrol 527:1130–1141

    Article  Google Scholar 

  • Wei X, Kusiak A (2015) Short-term prediction of influent flow in wastewater treatment plant. Stoch Env Res Risk Assess 29(1):241–249

    Article  Google Scholar 

  • Wei X, Kusiak A, Sadat HR (2013) Prediction of influent flow rate: data-mining approach. J Energy Eng 139:118–123

    Article  Google Scholar 

  • Winkler RL (1972) A decision-theoretic approach to interval estimation. J Am Stat Assoc 67(337):187–191

    Article  Google Scholar 

  • Yeh AG, Li X (2002) Urban simulation using neural networks and cellular automata for land use planning. In: Advances in spatial data handling. pp 451–464.   

    Chapter  Google Scholar 

  • Zahedi P, Parvandeh S, Asgharpour A, McLaury BS, Shirazi SA, McKinney BA (2018) Random forest regression prediction of solid particle Erosion in elbows. Powder Technol 338:983–992

    Article  CAS  Google Scholar 

  • Zhang D, Martinez N, Lindholm G, Ratnaweera H (2018) Manage sewer in-line storage control using hydraulic model and recurrent neural network. Water Resour Manag 32(6):2079–2098

    Article  Google Scholar 

  • Zhou Z, Wen C, Yang C (2015) Fault detection using random projections and k-nearest neighbor rule for semiconductor manufacturing processes. IEEE Trans Semicond Manuf 28(1):70–79

    Article  Google Scholar 

Download references

Acknowledgements

This study was supported by the Southern Ontario Water Consortium and the Natural Science and Engineering Research Council of Canada. The authors would like to thank the contributions of engineers at Hydromantis and the WWTPs for their suggestions and comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhong Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, P., Li, Z., Snowling, S. et al. A random forest model for inflow prediction at wastewater treatment plants. Stoch Environ Res Risk Assess 33, 1781–1792 (2019). https://doi.org/10.1007/s00477-019-01732-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-019-01732-9

Keywords

Navigation