Abstract
Due to sensor malfunctions and communication faults, multiple missing patterns frequently happen in wastewater treatment process (WWTP). Nevertheless, the existing missing data imputation works cannot stand multiple missing patterns because they have not sufficiently utilized of data information. In this article, a double-cycle weighted imputation (DCWI) method is proposed to deal with multiple missing patterns by maximizing the utilization of the available information in variables and instances. The proposed DCWI is comprised of two components: a double-cycle-based imputation sorting and a weighted K nearest neighbor-based imputation estimator. First, the double-cycle mechanism, associated with missing variable sorting and missing instance sorting, is applied to direct the missing values imputation. Second, the weighted K nearest neighbor-based imputation estimator is used to acquire the global similar instances and capture the volatility in the local region. The estimator preserves the original data characteristics as much as possible and enhances the imputation accuracy. Finally, experimental results on simulated and real WWTP datasets with non-stationarity and nonlinearity demonstrate that the proposed DCWI produces more accurate imputation results than comparison methods under different missing patterns and missing ratios.
Similar content being viewed by others
Referencrs
Xie Y B, Wang D, Qiao J F. Dynamic multi-objective intelligent optimal control toward wastewater treatment processes. Sci China Tech Sci, 2022, 65: 569–580
Han H G, Zhang L, Liu H X, et al. Multiobjective design of fuzzy neural network controller for wastewater treatment process. Appl Soft Comput, 2018, 67: 467–478
Wei W, Xia P, Liu Z, et al. A modified active disturbance rejection control for a wastewater treatment process. Chin J Chem Eng, 2020, 28: 2607–2619
Han H G, Qiao J F. Adaptive dissolved oxygen control based on dynamic structure neural network. Appl Soft Comput, 2011, 11: 3812–3820
Han H, Liu Z, Hou Y, et al. Data-driven multiobjective predictive control for wastewater treatment process. IEEE Trans Ind Inf, 2020, 16: 2767–2775
Han H G, Zhang J C, Du S L, et al. Robust optimal control for anaerobic-anoxic-oxic reactors. Sci China Tech Sci, 2021, 64: 1485–1499
Newhart K B, Holloway R W, Hering A S, et al. Data-driven performance analyses of wastewater treatment plants: A review. Water Res, 2019, 157: 498–513
Kadlec P, Gabrys B, Strandt S. Data-driven soft sensors in the process industry. Comput Chem Eng, 2009, 33: 795–814
Imtiaz S A, Shah S L. Treatment of missing values in process data analysis. Can J Chem Eng, 2008, 86: 838–858
Duan F, Jia H, Zhang Z W, et al. On the robustness of EEG tensor completion methods. Sci China Tech Sci, 2021, 64: 1828–1842
Audigier V, Husson F, Josse J. Multiple imputation for continuous variables using a Bayesian principal component analysis. J Stat Comput Simul, 2016, 86: 2140–2156
Li Y Y, Parker L E. Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks. Inf Fusion, 2014, 15: 64–79
Yan Y, Wu Y, Du X, et al. Incomplete data ensemble classification using imputation-revision framework with local spatial neighborhood information. Appl Soft Comput, 2021, 99: 106905
Bi H, Liu J, Zhang B, et al. Baseline distribution optimization and missing data completion in wavelet-based CS-TomoSAR. Sci China Inf Sci, 2018, 61: 042302
White I R, Royston P, Wood A M. Multiple imputation using chained equations: Issues and guidance for practice. Statist Med, 2011, 30: 377–399
Turrado C C, López M D C M, Lasheras F S, et al. Missing data imputation of solar radiation data under different atmospheric conditions. Sensors, 2014, 14: 20382–20399
Cheliotis M, Gkerekos C, Lazakis I, et al. A novel data condition and performance hybrid imputation method for energy efficient operations of marine systems. Ocean Eng, 2019, 188: 106220
Phan T T H, Bigand A, Caillault É P. A new fuzzy logic-based similarity measure applied to large gap imputation for uncorrelated multivariate time series. Appl Comput Intell Soft Comput, 2018, 2018: 1–15
Alavi N, Warland J S, Berg A A. Filling gaps in evapotranspiration measurements for water budget studies: Evaluation of a Kalman filtering approach. Agric For Meteor, 2006, 141: 57–66
Kachuee M, Karkkainen K, Goldstein O, et al. Generative imputation and stochastic prediction. IEEE Trans Pattern Anal Mach Intell, 2022, 44: 1278–1288
Tabari H, Hosseinzadeh Talaee P. Reconstruction of river water quality missing data using artificial neural networks. Water Qual Res J, 2015, 50: 326–335
Deng L, Liu X Y, Zheng H, et al. Graph spectral regularized tensor completion for traffic data imputation. IEEE Trans Intell Transp Syst, 2022, 23: 10996–11010
Van Hulse J, Khoshgoftaar T M. Incomplete-case nearest neighbor imputation in software measurement data. Inf Sci, 2014, 259: 596–610
Tak S, Woo S, Yeo H. Data-driven imputation method for traffic data in sectional units of road links. IEEE Trans Intell Transp Syst, 2016, 17: 1762–1771
Batista G E A P A, Monard M C. An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell, 2003, 17: 519–533
François D, Rossi F, Wertz V, et al. Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing, 2007, 70: 1276–1288
Kwak N, Chong-Ho Choi N. Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Machine Intell, 2002, 24: 1667–1671
Faisal S, Tutz G. Missing value imputation for gene expression data by tailored nearest neighbors. Statistical Appl Genet Mol Biol, 2017, 16: 95–106
García-Laencina P J, Sancho-Gómez J L, Figueiras-Vidal A R, et al. K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing, 2009, 72: 1483–1493
Rossi F, Lendasse A, François D, et al. Mutual information for the selection of relevant variables in spectrometric nonlinear modelling. Chemometrics Intell Lab Syst, 2006, 80: 215–226
Tavazzi E, Daberdaku S, Vasta R, et al. Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach. BMC Med Inform Decis Mak, 2020, 20: 174
Hoque N, Ahmed H A, Bhattacharyya D K, et al. A fuzzy mutual information-based feature selection method for classification. Fuzzy Inf Eng, 2016, 8: 355–384
Bugata P, Drotar P. On some aspects of minimum redundancy maximum relevance feature selection. Sci China Inf Sci, 2020, 63: 112103
Pan R, Yang T, Cao J, et al. Missing data imputation by K nearest neighbours based on grey relational structure and mutual information. Appl Intell, 2015, 43: 614–632
Sahri Z, Yusof R, Watada J. FINNIM: Iterative imputation of missing values in dissolved gas analysis dataset. IEEE Trans Ind Inf, 2014, 10: 2093–2102
Sefidian A M, Daneshpour N. Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model. Expert Syst Appl, 2019, 115: 68–94
Ba-Alawi A H, Loy-Benitez J, Kim S Y, et al. Missing data imputation and sensor self-validation towards a sustainable operation of waste-water treatment plants via deep variational residual autoencoders. Chemosphere, 2022, 288: 132647
Ba-Alawi A H, Vilela P, Loy-Benitez J, et al. Intelligent sensor validation for sustainable influent quality monitoring in wastewater treatment plants using stacked denoising autoencoders. J Water Process Eng, 2021, 43: 102206
Han H, Sun M, Han H, et al. Univariate imputation method for recovering missing data in wastewater treatment process. Chin J Chem Eng, 2022, 4: 1–20
Smith B L, Scherer W T, Conklin J H. Exploring imputation techniques for missing data in transportation management systems. Transpa Res Record, 2003, 1836: 132–142
Kim K Y, Kim B J, Yi G S. Reuse of imputed data in microarray analysis increases imputation efficiency. BMC BioInf, 2004, 5: 160
Kowarik A, Templ M. Imputation with the R Package VIM. J Stat Soft, 2016, 74: 1–16
Templ M, Kowarik A, Filzmoser P. Iterative stepwise regression imputation using standard and robust methods. Comput Stat Data Anal, 2011, 55: 2793–2806
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Key Research and Development Project (Grant No. 2018YFC1900800-5), the National Natural Science Foundation of China (Grant Nos. 61890930-5, 61903010, 62021003 and 62125301), Beijing Natural Science Foundation (Grant No. KZ202110005009), and Beijing Outstanding Young Scientist Program (Grant No. BJJWZYJH 01201910005020).
Rights and permissions
About this article
Cite this article
Han, H., Sun, M., Wu, X. et al. Double-cycle weighted imputation method for wastewater treatment process data with multiple missing patterns. Sci. China Technol. Sci. 65, 2967–2978 (2022). https://doi.org/10.1007/s11431-022-2163-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11431-022-2163-1