Skip to main content
Log in

A missing value filling model based on feature fusion enhanced autoencoder

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the advent of the big data era, the data quality problem is becoming more critical. Among many factors, data with missing values is one primary issue, and thus developing effective imputation models is a key topic in the research community. Recently, a major research direction is to employ neural network models such as self-organizing mappings or automatic encoders for filling missing values. However, these classical methods can hardly discover interrelated features and common features simultaneously among data attributes. Especially, it is a very typical problem for classical autoencoders that they often learn invalid constant mappings, which dramatically hurts the filling performance. To solve the above-mentioned problems, we propose a missing-value-filling model based on a feature-fusion-enhanced autoencoder. We first incorporate into an autoencoder a hidden layer that consists of de-tracking neurons and radial basis function neurons, which can enhance the ability of learning interrelated features and common features. Besides, we develop a missing value filling strategy based on dynamic clustering that is incorporated into an iterative optimization process. This design can enhance the multi-dimensional feature fusion ability and thus improves the dynamic collaborative missing-value-filling performance. The effectiveness of the proposed model is validated by extensive experiments compared to a variety of baseline methods on thirteen data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. This paper is an extended version of [11], which has been accepted for presentation at the 15th International FLINS Conferences on Machine learning, Multi agent and Cyber physical systems (FLINS2022).

References

  1. Canbek G (2022) Gaining insights in datasets in the shade of “garbage in, garbage out’’ rationale: Feature space distribution fitting. Wiley Interdisciplinary Reviews: Data Min Knowl Disc 12(3):1456

    Google Scholar 

  2. Xue Z, Wang H (2021) Effective density-based clustering algorithms for incomplete data. Big Data Min Anal 4(3):183–194

  3. Kabir S, Farrokhvar L (2022) Non-linear missing data imputation for healthcare data via index-aware autoencoders. Health Care Manag Sci 1–14

  4. Lai X, Wu X, Zhang L, Lu W, Zhong C (2019) Imputations of missing values using a tracking-removed autoencoder trained with incomplete data. Neurocomputing 366:54–65

    Article  Google Scholar 

  5. Lai X, Wu X, Zhang L, Zhang G (2019) Imputation using a correlationenhanced auto-associative neural network with dynamic processing of missing values. In: International Symposium on Neural Networks, pp. 223–231

  6. Liu K, Lu N, Wu F, Zhang R, Gao F (2022) Model fusion and multiscale feature learning for fault diagnosis of industrial processes. IEEE Trans Cybernet

  7. Vatanen T, Osmala M, Raiko T, Lagus K, Sysi-Aho M, Orešič M, Honkela T, Lähdesmäki H (2015) Self-organization and missing values in som and gtm. Neurocomputing 147:60–70

  8. Yousefi-Azar M, Varadharajan V, Hamey L, Tupakula U (2017) Autoencoder-based feature learning for cyber security applications. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3854–3861. IEEE

  9. Daoud M, Mayo M, Cunningham SJ (2019) Rbfa: radial basis function autoencoders. In: 2019 IEEE Congress on Evolutionary Computation (CEC), pp. 2966–2973. IEEE

  10. Ravi V, Krishna M (2014) A new online data imputation method based on general regression auto associative neural network. Neurocomputing 138:106–113

    Article  Google Scholar 

  11. Liu X, Du S, Teng F, Li T (2022) A missing value filling model based on feature fusion enhanced autoencoder. In: 15th International FLINS Conferences on Machine learning, Multi agent and Cyber physical systems

  12. Hamzah FB, Hamzah FM, Razali SM, Samad H (2021) A comparison of multiple imputation methods for recovering missing data in hydrological studies. Civ Eng J 7(9):1608–1619

    Article  Google Scholar 

  13. Li D, Zhang H, Li T, Bouras A, Yu X, Wang T (2021) Hybrid missing value imputation algorithms using fuzzy c-means and vaguely quantified rough set. IEEE Transactions on Fuzzy Systems 30(5):1396–1408

    Article  Google Scholar 

  14. Rumaling MI, Chee FP, Dayou J, Chang J, Sentian J (2020) Missing value imputation for pm10 concentration in sabah using nearest neighbour method (nnm) and expectation-maximization (em) algorithm. Asian J Atmos Environ 14(1):62–72

    Article  Google Scholar 

  15. Ma B, Li C, Jiang L (2022) A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning. Appl Intell 1–13

  16. Tutz G, Ramzan S (2015) Improved methods for the imputation of missing data by nearest neighbor methods. ComputStat Data Anal 90:84–99

    Article  MathSciNet  MATH  Google Scholar 

  17. Wang M, Li D, Xue C, Qi K, Yang E (2019) Sknn algorithm for filling missing oil data based on knn. IOP Conf Ser Mater Sci Eng 612:032099

  18. Migdady H, Al-Talib MM (2018) An enhanced fuzzy k-means clustering with application to missing data imputation. Electron J Appl Stat Anal 11(2):674–686

    MathSciNet  Google Scholar 

  19. Li D, Zhang H, Li T, Bouras A, Yu X, Wang T (2021) Hybrid missing value imputation algorithms using fuzzy c-means and vaguely quantified rough set. IEEE Trans Fuzzy Syst PP, 1–1

  20. Deng W, Guo Y, Liu J, Li Y, Liu D, Zhu L (2019) A missing power data filling method based on improved random forest algorithm. Chinese J Electr Eng 5(4):33–39

    Article  Google Scholar 

  21. Noei M, Abadeh MS (2019) A genetic asexual reproduction optimization algorithm for imputing missing values. In: 2019 9th International Conference on Computer and Knowledge Engineering (ICCKE), pp. 214–218

  22. Mostafa SM, Eladimy AS, Hamad S, Amano H (2020) Cbrl and cbrc: Novel algorithms for improving missing value imputation accuracy based on bayesian ridge regression. Symmetry 12(10):1594

    Article  Google Scholar 

  23. Tang S, Yuan S, Zhu Y (2019) Deep learning-based intelligent fault diagnosis methods toward rotating machinery. Ieee Access 8:9335–9346

  24. Al-Kaabi K, Monsefi R, Zabihzadeh D (2022) A framework to enhance generalization of deep metric learning methods using general discriminative feature learning and class adversarial neural networks. Appl Intell, 1–19

  25. Saad M, Chaudhary M, Karray F, Gaudet V (2020) Machine learning based approaches for imputation in time series data and their impact on forecasting.In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2621–2627

  26. Wang T, Ke H, Jolfaei A, Wen S, Haghighi MS, Huang S (2022) Missing value filling based on the collaboration of cloud and edge in artificial intelligence of things. IEEE Trans Ind Inform 18(8):5394–5402

    Article  Google Scholar 

  27. Sanjar K, Bekhzod O, Kim J, Paul A, Kim J (2020) Missing data imputation for geolocation-based price prediction using knn-mcf method. ISPRS Int J Geo-Inform 9(4):227

    Article  Google Scholar 

  28. Zhao R, Yan R, Chen Z, Mao K, Wang P, Gao RX (2019) Deep learning and its applications to machine health monitoring. Mech Syst Signal Process 115:213–237

    Article  Google Scholar 

  29. Lall R, Robinson T (2022) The midas touch: Accurate and scalable missingdata imputation with deep learning. Political Anal 30(2):179–196

    Article  Google Scholar 

  30. Rodríguez-Fdez I, Canosa A, Mucientes M, Bugarín A (2015) STAC: a web platform for the comparison of algorithms using statistical tests. In: Proceedings of the 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)

Download references

Acknowledgements

This work is supported by the National Key R &D Program of China (No.2020AAA0105101), the National Natural Science Foundation of China (No.62276215, 62176221, 61976247)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shengdong Du.

Ethics declarations

Competing of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Du, S., Li, T. et al. A missing value filling model based on feature fusion enhanced autoencoder. Appl Intell 53, 24931–24946 (2023). https://doi.org/10.1007/s10489-023-04892-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04892-y

Keywords

Navigation