Skip to main content
Log in

Online machine learning for stream wastewater influent flow rate prediction under unprecedented emergencies

  • Research Article
  • Published:
Frontiers of Environmental Science & Engineering Aims and scope Submit manuscript

Abstract

Accurate influent flow rate prediction is important for operators and managers at wastewater treatment plants (WWTPs), as it is closely related to wastewater characteristics such as biochemical oxygen demand (BOD), total suspend solids (TSS), and pH. Previous studies have been conducted to predict influent flow rate, and it was proved that data-driven models are effective tools. However, most of these studies have focused on batch learning, which is inadequate for wastewater prediction in the era of COVID-19 as the influent pattern changed significantly. Online learning, which has distinct advantages of dealing with stream data, large data set, and changing data pattern, has a potential to address this issue. In this study, the performance of conventional batch learning models Random Forest (RF), K-Nearest Neighbors (KNN), and Multi-Layer Perceptron (MLP), and their respective online learning models Adaptive Random Forest (aRF), Adaptive K-Nearest Neighbors (aKNN), and Adaptive Multi-Layer Perceptron (aMLP), were compared for predicting influent flow rate at two Canadian WWTPs. Online learning models achieved the highest R2, the lowest MAPE, and the lowest RMSE compared to conventional batch learning models in all scenarios. The R2 values on testing data set for 24-h ahead prediction of the aRF, aKNN, and aMLP at Plant A were 0.90, 0.73, and 0.87, respectively; these values at Plant B were 0.75, 0.78, and 0.56, respectively. The proposed online learning models are effective in making reliable predictions under changing data patterns, and they are efficient in dealing with continuous and large influent data streams. They can be used to provide robust decision support for wastewater treatment and management in the changing era of COVID-19 and also under other unprecedented emergencies that could change influent patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abu-Bakar H, Williams L, Hallett S H (2021). Quantifying the impact of the COVID-19 lockdown on household water consumption patterns in England. npj Clean Water, 4: 1–9

    Article  Google Scholar 

  • Agirre-Basurko E, Ibarra-Berastegi G, Madariaga I (2006). Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area. Environmental Modelling & Software, 21(4): 430–446

    Article  Google Scholar 

  • Ahmed N K, Atiya A F, Gayar N E, El-Shishiny H (2010). An empirical comparison of machine learning models for time series forecasting. Econometric Reviews, 29(5–6): 594–621

    Article  Google Scholar 

  • Alfano V, Ercolano S (2020). The efficacy of lockdown against COVID-19: a cross-country panel analysis. Applied Health Economics and Health Policy, 18: 509–517

    Article  Google Scholar 

  • Andreides M, Dolejš P, Bartáček J (2022). The prediction of WWTP influent characteristics: good practices and challenges. Journal of Water Process Engineering, 49: 103009

    Article  Google Scholar 

  • Ansari M, Othman F, Abunama T, El-Shafie A (2018). Analysing the accuracy of machine learning techniques to develop an integrated influent time series model: case study of a sewage treatment plant, Malaysia. Environmental Science and Pollution Research International, 25(12): 12139–12149

    Article  Google Scholar 

  • Bechmann H, Nielsen M K, Madsen H, Kjølstad Poulsen N (1999). Grey-box modelling of pollutant loads from a sewer system. Urban Water, 1(1): 71–78

    Article  CAS  Google Scholar 

  • Bifet A, Gavalda R (2007). Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, SIAM, pp. 443–448

  • Boyd G, Na D, Li Z, Snowling S, Zhang Q, Zhou P (2019). Influent forecasting for wastewater treatment plants in North America. Sustainability, 11(6): 1764

    Article  Google Scholar 

  • Breiman L (2001). Random forests. Machine Learning, 45(1): 5–32

    Article  Google Scholar 

  • Bzdok D, Krzywinski M, Altman N (2018). Machine learning: supervised methods. Nature Methods, 15(1): 5–6

    Article  CAS  Google Scholar 

  • Caruana R, Niculescu-Mizil A (2006). An empirical comparison of supervised learning algorithms. ACM International Conference Proceeding Series, 148: 161–168

    Google Scholar 

  • Domingos P, Hulten G (2000). Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80

  • Fontenla-Romero Ó, Guijarro-Berdiñas B, Martinez-Rego D, Pérez-Sánchez B, Peteiro-Barral D (2013). Online machine learning. In: Efficiency and Scalability Methods for Computational Intellect, IGI Global, pp. 27–54

  • Gautam S, Hens L (2020). COVID-19: impact by and on the environment, health and economy. Environment, Development and Sustainability, 22(6): 4953–4954

    Article  Google Scholar 

  • Gomes H M, Barddal J P, Ferreira L E B, Bifet A (2018). Adaptive random forests for data stream regression. In: ESANN

  • Gomes H M, Bifet A, Read J, Barddal J P, Enembreck F, Pfharinger B, Holmes G, Abdessalem T (2017). Adaptive random forests for evolving data stream classification. Machine Learning, 106(9–10): 1469–1495

    Article  Google Scholar 

  • Hillary L S, Farkas K, Maher K H, Lucaci A, Thorpe J, Distaso M A, Gaze W H, Paterson S, Burke T, Connor T R, McDonald J E, Malham S K, Jones D L (2021). Monitoring SARS-CoV-2 in municipal wastewater to evaluate the success of lockdown measures for controlling COVID-19 in the UK. Water Research, 200, 117214

    Article  CAS  Google Scholar 

  • Hoi S C H, Sahoo D, Lu J, Zhao P (2021). Online learning: a comprehensive survey. Neurocomputing, 459: 249–289

    Article  Google Scholar 

  • Hoi S C H, Wang J, Zhao P (2014). Libol: a library for online learning algorithms. Journal of Machine Learning Research, 15: 495–499

    Google Scholar 

  • Jain L C, Seera M, Lim C P, Balasubramaniam P (2014). A review of online learning in supervised neural networks. Neural Computing & Applications, 25(3–4): 491–509

    Article  Google Scholar 

  • Khan I, Shah D, Shah S S (2021). COVID-19 pandemic and its positive impacts on environment: an updated review. International Journal of Environmental Science and Technology, 18(2): 521–530

    Article  CAS  Google Scholar 

  • Kim M, Kim Y, Kim H, Piao W, Kim C (2016). Evaluation of the k-nearest neighbor method for forecasting the influent characteristics of wastewater treatment plant. Frontiers of Environmental Science & Engineering, 10(2): 299–310

    Article  CAS  Google Scholar 

  • Kovacs D J, Li Z, Baetz B W, Hong Y, Donnaz S, Zhao X, Zhou P, Ding H, Dong Q (2022). Membrane fouling prediction and uncertainty analysis using machine learning: a wastewater treatment plant case study. Journal of Membrane Science, 660: 120817

    Article  CAS  Google Scholar 

  • Ma S, Zeng S, Dong X, Chen J, Olsson G (2014). Short-term prediction of influent flow rate and ammonia concentration in municipal wastewater treatment plants. Frontiers of Environmental Science & Engineering, 8, 128–136

    Article  CAS  Google Scholar 

  • Montiel J, Read J, Bifet A, Abdessalem T (2018). Scikit-multiflow: a multi-output streaming framework. Journal of Machine Learning Research, 19: 2914–2915

    Google Scholar 

  • Nemati M, Tran D (2022). The impact of COVID-19 on urban water consumption in the United States. Water, 14: 3096

    Article  Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12: 2825–2830

    Google Scholar 

  • Pu Z, Yan J, Chen L, Li Z, Tian W, Tao T, Xin K (2023). A hybrid Wavelet-CNN-LSTM deep learning model for short-term urban water demand forecasting. Frontiers of Environmental Science & Engineering, 17(2): 22

    Article  Google Scholar 

  • Safaei S H, Young S, Samimi Z, Parvizi F, Shokrollahi A, and Baniamer M (2022). Technology development for the removal of Covid-19 pharmaceutical active compounds from water and wastewater: a review. Journal of Environmental Informatics, 40(2): 141–156

    Google Scholar 

  • Taunk K, De S, Verma S, Swetapadma A (2019). A brief review of nearest neighbor algorithm for learning and classification. 2019 International Conference on Intelligent Computing and Control Systems, ICCS 2019, 1255–1260

  • Wang Z, Wang Q, Wu T (2023). A novel hybrid model for water quality prediction based on VMD and IGOA optimized for LSTM. Frontiers of Environmental Science & Engineering, 17(7): 88

    Article  CAS  Google Scholar 

  • Wei X, Kusiak A (2015). Short-term prediction of influent flow in wastewater treatment plant. Stochastic Environmental Research and Risk Assessment, 29(1): 241–249

    Article  Google Scholar 

  • Wei X, Kusiak A, Sadat H R (2013). Prediction of influent flow rate: data-mining approach. Journal of Energy Engineering, 139(2): 118–123

    Article  Google Scholar 

  • Zhang Q, Li Z, Snowling S, Siam A, El-Dakhakhni W (2019). Predictive models for wastewater flow forecasting based on time series analysis and artificial neural network. Water Science and Technology, 80(2): 243–253

    Article  Google Scholar 

  • Zhou P, Li Z, Snowling S, Baetz B W, Na D, Boyd G (2019a). A random forest model for inflow prediction at wastewater treatment plants. Stochastic Environmental Research and Risk Assessment, 33(10): 1781–1792

    Article  Google Scholar 

  • Zhou P, Li Z, Snowling S, Goel R, Zhang Q (2019b). Short-term wastewater influent prediction based on random forests and multilayer perceptron. Journal of Environmental Informatics Letters, 1: 87–93

    Google Scholar 

  • Zhou P, Li Z, Snowling S, Goel R, Zhang Q (2022). Multi-step ahead prediction of hourly influent characteristics for wastewater treatment plants: a case study from North America. Environmental Monitoring and Assessment, 194(5): 1–14

    Article  Google Scholar 

  • Zhu J, Anderson P R (2019). Performance evaluation of the ISMLR package for predicting the next day’s influent wastewater flowrate at Kirie WRP. Water Science and Technology, 80(4): 695–706

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhong Li or Yimei Zhang.

Additional information

Highlights

• Online learning models accurately predict influent flow rate at wastewater plants.

• Models adapt to changing input-output relationships and are friendly to large data.

• Online learning models outperform conventional batch learning models.

• An optimal prediction strategy is identified through uncertainty analysis.

• The proposed models provide support for coping with emergencies like COVID-19.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Data Availability Statements

The data used in this study was acquired from company Hatch Ltd., and they have not given their permission for researchers to share their data. Data requests can be made to the company.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, P., Li, Z., Zhang, Y. et al. Online machine learning for stream wastewater influent flow rate prediction under unprecedented emergencies. Front. Environ. Sci. Eng. 17, 152 (2023). https://doi.org/10.1007/s11783-023-1752-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11783-023-1752-7

Keywords

Navigation