Skip to main content
Log in

Double-cycle weighted imputation method for wastewater treatment process data with multiple missing patterns

  • Article
  • Published:
Science China Technological Sciences Aims and scope Submit manuscript

Abstract

Due to sensor malfunctions and communication faults, multiple missing patterns frequently happen in wastewater treatment process (WWTP). Nevertheless, the existing missing data imputation works cannot stand multiple missing patterns because they have not sufficiently utilized of data information. In this article, a double-cycle weighted imputation (DCWI) method is proposed to deal with multiple missing patterns by maximizing the utilization of the available information in variables and instances. The proposed DCWI is comprised of two components: a double-cycle-based imputation sorting and a weighted K nearest neighbor-based imputation estimator. First, the double-cycle mechanism, associated with missing variable sorting and missing instance sorting, is applied to direct the missing values imputation. Second, the weighted K nearest neighbor-based imputation estimator is used to acquire the global similar instances and capture the volatility in the local region. The estimator preserves the original data characteristics as much as possible and enhances the imputation accuracy. Finally, experimental results on simulated and real WWTP datasets with non-stationarity and nonlinearity demonstrate that the proposed DCWI produces more accurate imputation results than comparison methods under different missing patterns and missing ratios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Referencrs

  1. Xie Y B, Wang D, Qiao J F. Dynamic multi-objective intelligent optimal control toward wastewater treatment processes. Sci China Tech Sci, 2022, 65: 569–580

    Article  Google Scholar 

  2. Han H G, Zhang L, Liu H X, et al. Multiobjective design of fuzzy neural network controller for wastewater treatment process. Appl Soft Comput, 2018, 67: 467–478

    Article  Google Scholar 

  3. Wei W, Xia P, Liu Z, et al. A modified active disturbance rejection control for a wastewater treatment process. Chin J Chem Eng, 2020, 28: 2607–2619

    Article  Google Scholar 

  4. Han H G, Qiao J F. Adaptive dissolved oxygen control based on dynamic structure neural network. Appl Soft Comput, 2011, 11: 3812–3820

    Article  Google Scholar 

  5. Han H, Liu Z, Hou Y, et al. Data-driven multiobjective predictive control for wastewater treatment process. IEEE Trans Ind Inf, 2020, 16: 2767–2775

    Article  Google Scholar 

  6. Han H G, Zhang J C, Du S L, et al. Robust optimal control for anaerobic-anoxic-oxic reactors. Sci China Tech Sci, 2021, 64: 1485–1499

    Article  Google Scholar 

  7. Newhart K B, Holloway R W, Hering A S, et al. Data-driven performance analyses of wastewater treatment plants: A review. Water Res, 2019, 157: 498–513

    Article  Google Scholar 

  8. Kadlec P, Gabrys B, Strandt S. Data-driven soft sensors in the process industry. Comput Chem Eng, 2009, 33: 795–814

    Article  Google Scholar 

  9. Imtiaz S A, Shah S L. Treatment of missing values in process data analysis. Can J Chem Eng, 2008, 86: 838–858

    Article  Google Scholar 

  10. Duan F, Jia H, Zhang Z W, et al. On the robustness of EEG tensor completion methods. Sci China Tech Sci, 2021, 64: 1828–1842

    Article  Google Scholar 

  11. Audigier V, Husson F, Josse J. Multiple imputation for continuous variables using a Bayesian principal component analysis. J Stat Comput Simul, 2016, 86: 2140–2156

    Article  MathSciNet  MATH  Google Scholar 

  12. Li Y Y, Parker L E. Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks. Inf Fusion, 2014, 15: 64–79

    Article  Google Scholar 

  13. Yan Y, Wu Y, Du X, et al. Incomplete data ensemble classification using imputation-revision framework with local spatial neighborhood information. Appl Soft Comput, 2021, 99: 106905

    Article  Google Scholar 

  14. Bi H, Liu J, Zhang B, et al. Baseline distribution optimization and missing data completion in wavelet-based CS-TomoSAR. Sci China Inf Sci, 2018, 61: 042302

    Article  Google Scholar 

  15. White I R, Royston P, Wood A M. Multiple imputation using chained equations: Issues and guidance for practice. Statist Med, 2011, 30: 377–399

    Article  MathSciNet  Google Scholar 

  16. Turrado C C, López M D C M, Lasheras F S, et al. Missing data imputation of solar radiation data under different atmospheric conditions. Sensors, 2014, 14: 20382–20399

    Article  Google Scholar 

  17. Cheliotis M, Gkerekos C, Lazakis I, et al. A novel data condition and performance hybrid imputation method for energy efficient operations of marine systems. Ocean Eng, 2019, 188: 106220

    Article  Google Scholar 

  18. Phan T T H, Bigand A, Caillault É P. A new fuzzy logic-based similarity measure applied to large gap imputation for uncorrelated multivariate time series. Appl Comput Intell Soft Comput, 2018, 2018: 1–15

    Google Scholar 

  19. Alavi N, Warland J S, Berg A A. Filling gaps in evapotranspiration measurements for water budget studies: Evaluation of a Kalman filtering approach. Agric For Meteor, 2006, 141: 57–66

    Article  Google Scholar 

  20. Kachuee M, Karkkainen K, Goldstein O, et al. Generative imputation and stochastic prediction. IEEE Trans Pattern Anal Mach Intell, 2022, 44: 1278–1288

    Article  Google Scholar 

  21. Tabari H, Hosseinzadeh Talaee P. Reconstruction of river water quality missing data using artificial neural networks. Water Qual Res J, 2015, 50: 326–335

    Article  Google Scholar 

  22. Deng L, Liu X Y, Zheng H, et al. Graph spectral regularized tensor completion for traffic data imputation. IEEE Trans Intell Transp Syst, 2022, 23: 10996–11010

    Article  Google Scholar 

  23. Van Hulse J, Khoshgoftaar T M. Incomplete-case nearest neighbor imputation in software measurement data. Inf Sci, 2014, 259: 596–610

    Article  Google Scholar 

  24. Tak S, Woo S, Yeo H. Data-driven imputation method for traffic data in sectional units of road links. IEEE Trans Intell Transp Syst, 2016, 17: 1762–1771

    Article  Google Scholar 

  25. Batista G E A P A, Monard M C. An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell, 2003, 17: 519–533

    Article  Google Scholar 

  26. François D, Rossi F, Wertz V, et al. Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing, 2007, 70: 1276–1288

    Article  Google Scholar 

  27. Kwak N, Chong-Ho Choi N. Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Machine Intell, 2002, 24: 1667–1671

    Article  Google Scholar 

  28. Faisal S, Tutz G. Missing value imputation for gene expression data by tailored nearest neighbors. Statistical Appl Genet Mol Biol, 2017, 16: 95–106

    Article  MathSciNet  MATH  Google Scholar 

  29. García-Laencina P J, Sancho-Gómez J L, Figueiras-Vidal A R, et al. K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing, 2009, 72: 1483–1493

    Article  Google Scholar 

  30. Rossi F, Lendasse A, François D, et al. Mutual information for the selection of relevant variables in spectrometric nonlinear modelling. Chemometrics Intell Lab Syst, 2006, 80: 215–226

    Article  Google Scholar 

  31. Tavazzi E, Daberdaku S, Vasta R, et al. Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach. BMC Med Inform Decis Mak, 2020, 20: 174

    Article  Google Scholar 

  32. Hoque N, Ahmed H A, Bhattacharyya D K, et al. A fuzzy mutual information-based feature selection method for classification. Fuzzy Inf Eng, 2016, 8: 355–384

    Article  MathSciNet  Google Scholar 

  33. Bugata P, Drotar P. On some aspects of minimum redundancy maximum relevance feature selection. Sci China Inf Sci, 2020, 63: 112103

    Article  MathSciNet  Google Scholar 

  34. Pan R, Yang T, Cao J, et al. Missing data imputation by K nearest neighbours based on grey relational structure and mutual information. Appl Intell, 2015, 43: 614–632

    Article  Google Scholar 

  35. Sahri Z, Yusof R, Watada J. FINNIM: Iterative imputation of missing values in dissolved gas analysis dataset. IEEE Trans Ind Inf, 2014, 10: 2093–2102

    Article  Google Scholar 

  36. Sefidian A M, Daneshpour N. Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model. Expert Syst Appl, 2019, 115: 68–94

    Article  Google Scholar 

  37. Ba-Alawi A H, Loy-Benitez J, Kim S Y, et al. Missing data imputation and sensor self-validation towards a sustainable operation of waste-water treatment plants via deep variational residual autoencoders. Chemosphere, 2022, 288: 132647

    Article  Google Scholar 

  38. Ba-Alawi A H, Vilela P, Loy-Benitez J, et al. Intelligent sensor validation for sustainable influent quality monitoring in wastewater treatment plants using stacked denoising autoencoders. J Water Process Eng, 2021, 43: 102206

    Article  Google Scholar 

  39. Han H, Sun M, Han H, et al. Univariate imputation method for recovering missing data in wastewater treatment process. Chin J Chem Eng, 2022, 4: 1–20

    Google Scholar 

  40. Smith B L, Scherer W T, Conklin J H. Exploring imputation techniques for missing data in transportation management systems. Transpa Res Record, 2003, 1836: 132–142

    Article  Google Scholar 

  41. Kim K Y, Kim B J, Yi G S. Reuse of imputed data in microarray analysis increases imputation efficiency. BMC BioInf, 2004, 5: 160

    Article  Google Scholar 

  42. Kowarik A, Templ M. Imputation with the R Package VIM. J Stat Soft, 2016, 74: 1–16

    Article  Google Scholar 

  43. Templ M, Kowarik A, Filzmoser P. Iterative stepwise regression imputation using standard and robust methods. Comput Stat Data Anal, 2011, 55: 2793–2806

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to HongGui Han.

Additional information

This work was supported by the National Key Research and Development Project (Grant No. 2018YFC1900800-5), the National Natural Science Foundation of China (Grant Nos. 61890930-5, 61903010, 62021003 and 62125301), Beijing Natural Science Foundation (Grant No. KZ202110005009), and Beijing Outstanding Young Scientist Program (Grant No. BJJWZYJH 01201910005020).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, H., Sun, M., Wu, X. et al. Double-cycle weighted imputation method for wastewater treatment process data with multiple missing patterns. Sci. China Technol. Sci. 65, 2967–2978 (2022). https://doi.org/10.1007/s11431-022-2163-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11431-022-2163-1

Navigation