Abstract
With the advent of the big data era, the data quality problem is becoming more critical. Among many factors, data with missing values is one primary issue, and thus developing effective imputation models is a key topic in the research community. Recently, a major research direction is to employ neural network models such as self-organizing mappings or automatic encoders for filling missing values. However, these classical methods can hardly discover interrelated features and common features simultaneously among data attributes. Especially, it is a very typical problem for classical autoencoders that they often learn invalid constant mappings, which dramatically hurts the filling performance. To solve the above-mentioned problems, we propose a missing-value-filling model based on a feature-fusion-enhanced autoencoder. We first incorporate into an autoencoder a hidden layer that consists of de-tracking neurons and radial basis function neurons, which can enhance the ability of learning interrelated features and common features. Besides, we develop a missing value filling strategy based on dynamic clustering that is incorporated into an iterative optimization process. This design can enhance the multi-dimensional feature fusion ability and thus improves the dynamic collaborative missing-value-filling performance. The effectiveness of the proposed model is validated by extensive experiments compared to a variety of baseline methods on thirteen data sets.
Similar content being viewed by others
Notes
This paper is an extended version of [11], which has been accepted for presentation at the 15th International FLINS Conferences on Machine learning, Multi agent and Cyber physical systems (FLINS2022).
References
Canbek G (2022) Gaining insights in datasets in the shade of “garbage in, garbage out’’ rationale: Feature space distribution fitting. Wiley Interdisciplinary Reviews: Data Min Knowl Disc 12(3):1456
Xue Z, Wang H (2021) Effective density-based clustering algorithms for incomplete data. Big Data Min Anal 4(3):183–194
Kabir S, Farrokhvar L (2022) Non-linear missing data imputation for healthcare data via index-aware autoencoders. Health Care Manag Sci 1–14
Lai X, Wu X, Zhang L, Lu W, Zhong C (2019) Imputations of missing values using a tracking-removed autoencoder trained with incomplete data. Neurocomputing 366:54–65
Lai X, Wu X, Zhang L, Zhang G (2019) Imputation using a correlationenhanced auto-associative neural network with dynamic processing of missing values. In: International Symposium on Neural Networks, pp. 223–231
Liu K, Lu N, Wu F, Zhang R, Gao F (2022) Model fusion and multiscale feature learning for fault diagnosis of industrial processes. IEEE Trans Cybernet
Vatanen T, Osmala M, Raiko T, Lagus K, Sysi-Aho M, Orešič M, Honkela T, Lähdesmäki H (2015) Self-organization and missing values in som and gtm. Neurocomputing 147:60–70
Yousefi-Azar M, Varadharajan V, Hamey L, Tupakula U (2017) Autoencoder-based feature learning for cyber security applications. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3854–3861. IEEE
Daoud M, Mayo M, Cunningham SJ (2019) Rbfa: radial basis function autoencoders. In: 2019 IEEE Congress on Evolutionary Computation (CEC), pp. 2966–2973. IEEE
Ravi V, Krishna M (2014) A new online data imputation method based on general regression auto associative neural network. Neurocomputing 138:106–113
Liu X, Du S, Teng F, Li T (2022) A missing value filling model based on feature fusion enhanced autoencoder. In: 15th International FLINS Conferences on Machine learning, Multi agent and Cyber physical systems
Hamzah FB, Hamzah FM, Razali SM, Samad H (2021) A comparison of multiple imputation methods for recovering missing data in hydrological studies. Civ Eng J 7(9):1608–1619
Li D, Zhang H, Li T, Bouras A, Yu X, Wang T (2021) Hybrid missing value imputation algorithms using fuzzy c-means and vaguely quantified rough set. IEEE Transactions on Fuzzy Systems 30(5):1396–1408
Rumaling MI, Chee FP, Dayou J, Chang J, Sentian J (2020) Missing value imputation for pm10 concentration in sabah using nearest neighbour method (nnm) and expectation-maximization (em) algorithm. Asian J Atmos Environ 14(1):62–72
Ma B, Li C, Jiang L (2022) A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning. Appl Intell 1–13
Tutz G, Ramzan S (2015) Improved methods for the imputation of missing data by nearest neighbor methods. ComputStat Data Anal 90:84–99
Wang M, Li D, Xue C, Qi K, Yang E (2019) Sknn algorithm for filling missing oil data based on knn. IOP Conf Ser Mater Sci Eng 612:032099
Migdady H, Al-Talib MM (2018) An enhanced fuzzy k-means clustering with application to missing data imputation. Electron J Appl Stat Anal 11(2):674–686
Li D, Zhang H, Li T, Bouras A, Yu X, Wang T (2021) Hybrid missing value imputation algorithms using fuzzy c-means and vaguely quantified rough set. IEEE Trans Fuzzy Syst PP, 1–1
Deng W, Guo Y, Liu J, Li Y, Liu D, Zhu L (2019) A missing power data filling method based on improved random forest algorithm. Chinese J Electr Eng 5(4):33–39
Noei M, Abadeh MS (2019) A genetic asexual reproduction optimization algorithm for imputing missing values. In: 2019 9th International Conference on Computer and Knowledge Engineering (ICCKE), pp. 214–218
Mostafa SM, Eladimy AS, Hamad S, Amano H (2020) Cbrl and cbrc: Novel algorithms for improving missing value imputation accuracy based on bayesian ridge regression. Symmetry 12(10):1594
Tang S, Yuan S, Zhu Y (2019) Deep learning-based intelligent fault diagnosis methods toward rotating machinery. Ieee Access 8:9335–9346
Al-Kaabi K, Monsefi R, Zabihzadeh D (2022) A framework to enhance generalization of deep metric learning methods using general discriminative feature learning and class adversarial neural networks. Appl Intell, 1–19
Saad M, Chaudhary M, Karray F, Gaudet V (2020) Machine learning based approaches for imputation in time series data and their impact on forecasting.In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2621–2627
Wang T, Ke H, Jolfaei A, Wen S, Haghighi MS, Huang S (2022) Missing value filling based on the collaboration of cloud and edge in artificial intelligence of things. IEEE Trans Ind Inform 18(8):5394–5402
Sanjar K, Bekhzod O, Kim J, Paul A, Kim J (2020) Missing data imputation for geolocation-based price prediction using knn-mcf method. ISPRS Int J Geo-Inform 9(4):227
Zhao R, Yan R, Chen Z, Mao K, Wang P, Gao RX (2019) Deep learning and its applications to machine health monitoring. Mech Syst Signal Process 115:213–237
Lall R, Robinson T (2022) The midas touch: Accurate and scalable missingdata imputation with deep learning. Political Anal 30(2):179–196
Rodríguez-Fdez I, Canosa A, Mucientes M, Bugarín A (2015) STAC: a web platform for the comparison of algorithms using statistical tests. In: Proceedings of the 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)
Acknowledgements
This work is supported by the National Key R &D Program of China (No.2020AAA0105101), the National Natural Science Foundation of China (No.62276215, 62176221, 61976247)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, X., Du, S., Li, T. et al. A missing value filling model based on feature fusion enhanced autoencoder. Appl Intell 53, 24931–24946 (2023). https://doi.org/10.1007/s10489-023-04892-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04892-y