Skip to main content

Data Imputation in Wireless Sensor Network Using Deep Learning Techniques

  • Conference paper
  • First Online:
Data Analytics and Management

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 54))

Abstract

Missing information in the sequence of data provided by the wireless sensor network is a prevalent issue. There can be many reasons which may be responsible for this like network loss, sensor maintenance, and sensor failure. Retrieving the missing data from the time series data obtained by the wireless sensor network proves to be a difficult task. There have been many methods which try to recover this data, but limitation still exists. The proposed work discusses the use of deep learning algorithms for the time series prediction of WSN data generated from real-time sensor devices. This paper examines various techniques available in deep learning for time series forecasting and analyzes the results obtained from various methods and at the end, gives the best hybrid combination of which contains the bidirectional LSTM layer for a deep examination of the pattern in the data. The main motivation behind the work is to improve the working environment of the WSN fields, automate various processess of maintenance, and provide an effective method of data imputation in the wireless sensor network in a real-time environment when there is a scarcity of data and also, develop a method for effective forecasting. Another approach that is examined is a CNN layer to find the positional pattern in the data, attention mechanism to put more focus on the relevant part of the sequence, an LSTM layer which makes it an encoder–decoder model, and a dense layer at the end to produce the output in the desired shape. The model is trained and tested on Beijing Air Pollution PM2.5 Dataset. To overcome the problem of lower availability of data, VLSW algorithm is used, which helped in generating the large sample of training data from the limited available datasets. After analyzing and studying the various models and their results with various hyperparameters tried, it is concluded that the model with the LSTM attention mechanism with the encoder–decoder model along with VLSW works best for long missing time series data imputation. Other studied models and their results are summarized too. This work concludes based on the practical and real-time application of data imputation that the CNN model with online training works best as it takes less time and resources to train. The results obtained are better from the existing solution available. The SSIM model with VLSW performed 31% more efficient than other methods, while CNN without VLSW performed 76% more efficient than other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang H, Yang G, Xu J, Chen Z, Chen L, Yang Z (2011) A novel data collection approach for Wirelsee Sensor Networks. In: 2011 international conference on electrical and control engineering, Yichang, 2011, pp 4287-4290. https://doi.org/10.1109/ICECENG.2011.6057687.

  2. Agarwal A, Solanki A (2016) An improved data clustering algorithm for outlier detection. Self-organology 3(4):121–139

    Google Scholar 

  3. Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. Wiley, Hoboken, NJ

    Google Scholar 

  4. Zhou J, Huang Z (2018) Recover missing sensor data with iterative imputing network. In: Proceedings of workshops 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, Feb 2018, pp 209–215

    Google Scholar 

  5. Ahuja R, Solanki A (2019) Movie recommender system using K-Means clustering and K-Nearest Neighbor. In: Confluence-2019: 9th international conference on cloud computing, data science & engineering, Amity University, Noida, vol 1231, no 21, pp 25–38 (accepted for publication)

    Google Scholar 

  6. Lv P, Yue L (2011) Short-term wind speed forecasting based on non-stationary time series analysis and ARCH model. In: 2011 international conference on multimedia technology, Hangzhou, 2011, pp 2549–2553. https://doi.org/10.1109/ICMT.2011.6002447

  7. Thissen U, Brakel RV, Weijer APD et al (2003) Using support vector machines for time series prediction. Chemometr Intell Lab Syst 69(1–2):35–49

    Article  Google Scholar 

  8. Natarajan VA, Karatampati P (2019) Survey on renewable energy forecasting using different techniques. In: 2019 2nd international conference on power and embedded drive control (ICPEDC), Chennai, India, 2019, pp 349–354. https://doi.org/10.1109/ICPEDC47771.2019.9036569

  9. Singh T, Nayyar A, Solanki A (2020) Multilingual opinion mining movie recommendation system using RNN. In: Singh P, Pawłowski W, Tanwar S, Kumar N, Rodrigues J, Obaidat M (eds) Proceedings of first international conference on computing, communications, and cyber-security (IC4S (2019). Lecture notes in networks and systems, vol 121. Springer, Singapore

    Google Scholar 

  10. Che Z, Purushotham S, Cho K et al (2018) Recurrent neural networks for multivariate time series with missing values. Sci Rep 8:6085. https://doi.org/10.1038/s41598-018-24271-9

    Article  Google Scholar 

  11. Xue N, Triguero I, Figueredo GP, Landa-Silva D (2019) Evolving deep CNN-LSTMs for inventory time series prediction. In: 2019 IEEE congress on evolutionary computation (CEC). https://doi.org/10.1109/cec.2019.8789957

  12. Zhang GP (2003) Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50:159–175

    Article  Google Scholar 

  13. Wang B, Sun Y, Xue B, Zhang M (2018) Evolving deep convolutional neural networks by variable-length particle swarm optimization for image classification. arXiv preprint arXiv:1803.06492

  14. Du S et al (2018) Deep air quality forecasting using hybrid deep learning framework. arXiv preprint arXiv:1812.04783

  15. Du S, Li T, Gong X, Yang Y, Horng SJ (2017) Traffic flow forecasting based on hybrid deep learning framework. In: 2017 12th international conference on intelligent systems and knowledge engineering (ISKE), Nov 2017, pp 1–6

    Google Scholar 

  16. Siami-Namini S, Tavakoli N, Siami Namin A (2018) A comparison of ARIMA and LSTM in forecasting time series, pp 1394–1401. https://doi.org/10.1109/ICMLA.2018.00227

  17. Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576 Jan

    Article  Google Scholar 

  18. Lachtermacher G, Fuller JD (1995) Back propagation in time-series forecasting. J Forecast 14(4):381–393

    Article  Google Scholar 

  19. Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688

    Article  Google Scholar 

  20. Rana S, John AH, Midi H (2012) Robust regression imputation for analyzing missing data. In: 2012 international conference on statistics in science, business and engineering (ICSSBE), Langkawi, 2012, pp 1–4. https://doi.org/10.1109/ICSSBE.2012.6396621

  21. Kim T, Kim HY (2019) Forecasting stock prices with a feature fusion LSTM-CNN model using different representations of the same data. PLoS ONE 14(2):e0212320

    Article  Google Scholar 

  22. Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation

    Google Scholar 

  23. Verma H, Kumar S (2019) An accurate missing data prediction method using LSTM based deep learning for health care. In: Proceedings of 20th international conference on distributed computing and networking, pp 371–376

    Google Scholar 

  24. Harvey AC (1990) Forecasting, structural time series models and the Kalman filter. Cambridge University Press, Cambridge

    Google Scholar 

  25. Fung DS (2006) Methods for the estimation of missing values in time series

    Google Scholar 

  26. Azur MJ, Stuart EA, Frangakis C, Leaf PJ (2011) Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 20(1):40–49

    Article  Google Scholar 

  27. Yi X, Zheng Y, Zhang J, Li T (2016) ST-MVL: filling missing values in geo-sensory time series data

    Google Scholar 

  28. Wang J, De Vries AP, Reinders MJ (2006) Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 501–508

    Google Scholar 

  29. Yuan H, Xu G, Yao Z, Jia J, Zhang Y (2018) Imputation of missing data in time series for air pollutants using long short-term memory recurrent neural networks. In: Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing and Symposium on Wearable Computers, 2018, pp 1293–1300

    Google Scholar 

  30. Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Sci Rep 8(1):6085

    Article  Google Scholar 

  31. Ilya S, Oriol V, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems 27: annual conference on neural information processing systems Montreal. Quebec, Canada, 8–13 Dec 2014, pp 3104–3112

    Google Scholar 

  32. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078

  33. Singh G, Solanki A (2016) An algorithm to transform natural language into SQL queries for relational databases. Selforganizology 3(3):100–116

    Google Scholar 

  34. Leke C, Twala B, Marwala T (2014) Missing data prediction and classification: the use of auto-associative neural networks and optimization algorithms. arXiv preprint arXiv:1403.5488

  35. Tayal A, Köse U, Solanki A, Nayyar A, ve Marmolejo Saucedo JA (2019) Efficiency analysis for stochastic dynamic facility layout problem using meta-heuristic, data envelopment analysis and machine learning. Comput Intell (Basımda). https://doi.org/10.1111/coin.12251

  36. Moghar A, Hamiche M (2020) Stock market prediction using LSTM recurrent neural network. Procedia Comput Sci 170:1168–1173. https://doi.org/10.1016/j.procs.2020.03.049

    Article  Google Scholar 

  37. Yang J, Nguyen MN, San PP, Li X, Krishnaswamy S (2015) Deep convolutional neural networks on multichannel time series for human activity recognition. In Ijcai, 2015, vol 15, pp 3995–4001

    Google Scholar 

  38. Yang Y, Guizhong L (2001) Multivariate time series prediction based on neural networks applied to stock market. In: 2001 IEEE international conference on systems, man and cybernetics. e-systems and e-man for cybernetics in cyberspace (Cat. No. 01CH37236), vol 4. IEEE

    Google Scholar 

  39. Lin T, Guo T, Aberer K (2017) Hybrid neural networks for learning the trend in time series. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI-17, pp 2273–2279

    Google Scholar 

  40. Pandey S, Solanki A (2019) Music instrument recognition using deep convolutional neural networks. Int J Inf Technol 13(3):129–149

    Google Scholar 

  41. Zhang Y-F, Thorburn P, Xiang W, Fitch P (2019) SSIM—a deep learning approach for recovering missing time series sensor data. IEEE Internet Things J 1. https://doi.org/10.1109/jiot.2019.2909038

  42. Kaur N, Solanki A (2018) Sentiment knowledge discovery in twitter using CoreNLP library. In: 8th international conference on cloud computing, data science and engineering (confluence), vol 345, no 32, pp 2342–2358

    Google Scholar 

  43. Rajput R, Solanki A (2016) Real-time analysis of tweets using machine learning and semantic analysis. In: International conference on communication and computing systems (ICCCS-2016). Taylor and Francis, at Dronacharya College of Engineering, Gurgaon, 9–11 Sept, vol 138, issue 25, pp 687–692

    Google Scholar 

  44. Rajput R, Solanki A (2016) Review of sentimental analysis methods using lexicon based approach. Int J Comput Sci Mob Comput 5(2):159–166

    Google Scholar 

  45. Priyadarshni V, Nayyar A, Solanki A, Anuragi A (2019) Human age classification system using K-NN classifier. In: Luhach A, Jat D, Hawari K, Gao XZ, Lingras P (eds) Advanced informatics for computing research. ICAICR 2019. Communications in computer and information science, vol 1075. Springer, Singapore

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arun Solanki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rani, S., Solanki, A. (2021). Data Imputation in Wireless Sensor Network Using Deep Learning Techniques. In: Khanna, A., Gupta, D., Pólkowski, Z., Bhattacharyya, S., Castillo, O. (eds) Data Analytics and Management. Lecture Notes on Data Engineering and Communications Technologies, vol 54. Springer, Singapore. https://doi.org/10.1007/978-981-15-8335-3_44

Download citation

Publish with us

Policies and ethics