Missing Data Estimation Using Firefly Algorithm

Leke, Collins Achepsah; Marwala, Tshilidzi

doi:10.1007/978-3-030-01180-2_5

Collins Achepsah Leke⁴ &
Tshilidzi Marwala⁴

Part of the book series: Studies in Big Data ((SBD,volume 48))

1272 Accesses
2 Citations

Abstract

In this chapter, we examine the problem of missing data in high-dimensional datasets by taking into consideration the missing completely at random and missing at random mechanisms, as well as the arbitrary missing pattern. Additionally, this chapter employs a methodology based on deep learning and swarm intelligence algorithms in order to provide reliable estimates for missing data. The deep learning technique is used to extract features from the input data via an unsupervised learning approach by modeling the data distribution based on the input. This deep learning technique is then used as part of the objective function for the swarm intelligence technique in order to estimate the missing data after a supervised fine-tuning phase by minimizing an error function based on the interrelationship and correlation between features in the dataset. The proposed methodology in this chapter, therefore, has longer running times, however, the promising potential outcomes justify the trade-off. Also, basic knowledge of statistics is presumed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdella, M., & Marwala, T. (2005). The use of genetic algorithms and neural networks to approximate missing data in database. In 3rd International Conference on Computational Cybernetics, ICCC 2005 (pp. 207–212). IEEE.
Google Scholar
Allison, P. D. (1999). Multiple imputation for missing data: A cautionary tale. Philadelphia.
Google Scholar
Aydilek, I. B., & Arslan, A. (2012). A novel hybrid approach to estimating missing values in databases using k-nearest neighbors and neural networks. International Journal of Innovative Computing, Information and Control, 7(8), 4705–4717.
Google Scholar
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. IEEE.
Google Scholar
Deng, L., et al. (2013). Recent advances in deep learning for speech research at Microsoft. In International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8604–8608). IEEE.
Google Scholar
Deng, L., & Yu, D. (2014). Deep learning: Methods and applications. Foundations and Trends in Signal Processing, 7(3–4), 197–387. Now Publishers Inc.
Google Scholar
Fischer, A., & Igel, C. (2012). An introduction to restricted Boltzmann machines. In 17th Iberoamerican Congress, CIARP, Proceedings Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (pp. 14–36). Heidelberg: Springer. ISBN: 978–3-642-33275-3.
Chapter Google Scholar
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. American Association for the Advancement of Science, 313(5786), 504–507.
Article MathSciNet Google Scholar
Isaacs, J. C. (2014). Representational learning for sonar ATR. In Proceedings SPIE 9072, Detection and Sensing of Mines, Explosive Objects, and Obscured Targets XIX (p. 907203). http://dx.doi.org/10.1117/12.2053057.
Koko, E. E. M., & Mohamed, A. I. A. (2015). Missing data treatment method on cluster analysis. International Journal of Advanced Statistics and Probability, 3(2), 191–209.
Article Google Scholar
LeCun, Y. (2016). The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. Accessed 1 Jan 2016.
Leke, C., Twala, B., & Marwala, T. (2014). Modeling of missing data prediction: Computational intelligence and optimization algorithms. In International Conference on Systems, Man and Cybernetics (SMC) (pp. 1400–1404). IEEE.
Google Scholar
Leke, C., & Marwala, T. (2016). Missing data estimation in high-dimensional datasets: A swarm intelligence-deep neural network approach. In International Conference in Swarm Intelligence. Springer International Publishing, pp. 259–270.
Google Scholar
Little, R. J., & Rubin, D. B. (2014). Statistical analysis with missing data. New York: Wiley.
MATH Google Scholar
Mistry, F. J., Nelwamondo, F. V., & Marwala, T. (2009). Missing data estimation using principle component analysis and autoassociative neural networks. Journal of Systemics, Cybernatics and Informatics, 7(3), 72–79.
Google Scholar
Nelwamondo, F. V., Mohamed, S., & Marwala, T. (2007). Missing data: A comparison of neural network and expectation maximisation techniques. ArXiv preprint arXiv:0704.3474.
Rana, S., John, A. H., Midi, H., & Imon, A. (2015). Robust regression imputation for missing data in the presence of outliers. Far East Journal of Mathematical Sciences, 97(2), 183. Pushpa Publishing House.
Google Scholar
Rubin, D. B. (1978). Multiple imputations in sample surveys-a phenomenological Bayesian approach to nonresponse. In Proceedings of the Survey Research Methods Section of the American Statistical (vol. 1, pp. 20–34). Association. American Statistical Association.
Google Scholar
Yang, X.-S. (2010). Firefly algorithm, Levy flights and global optimization. In M. Bramer, R. Ellis, & M. Petridis (Eds.), Research and development in intelligent systems XXVI (pp. 209–218). London: Springer.
Chapter Google Scholar
Zhang, S., Jin, Z., & Zhu, X. (2011). Missing data imputation by utilizing information within incomplete instances. Journal of Systems and Software, 84(3), 452–459. Elsevier.
Google Scholar
Zhang, S. (2011). Shell-neighbor method and its application in missing data imputation. Applied Intelligence, 35(1), 123–133. Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering and Built Environment, University of Johannesburg, Auckland Park, South Africa
Collins Achepsah Leke & Tshilidzi Marwala

Authors

Collins Achepsah Leke
View author publications
You can also search for this author in PubMed Google Scholar
Tshilidzi Marwala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Collins Achepsah Leke .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Leke, C.A., Marwala, T. (2019). Missing Data Estimation Using Firefly Algorithm. In: Deep Learning and Missing Data in Engineering Systems. Studies in Big Data, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-030-01180-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-01180-2_5
Published: 14 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01179-6
Online ISBN: 978-3-030-01180-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics