Skip to main content

Power Data Cleaning Method Based on Isolation Forest and LSTM Neural Network

  • Conference paper
  • First Online:
Cloud Computing and Security (ICCCS 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11067))

Included in the following conference series:

  • 2276 Accesses

Abstract

In the background of big data in power system, data cleaning of power operation and maintenance data can effectively improve data quality, making a good base for data analysis. In the process of data cleaning, the power data anomaly detection accuracy and data correction error have been a technical difficulty. To deal with these problems, we propose a data cleaning method based on Correlation isolation Forest and Attention-based LSTM (CiF-AL). This method constructs the isolation forest based on correlation between data attributes to extract the features of the training dataset, detects the anomalous data in the data set, and then uses the improved LSTM neural network model based on attention mechanism to predict and modify the anomalous data. The experimental results show that the power operation and maintenance data cleaning program based on CiF-AL has been effectively optimized in the accuracy of positioning of anomalous data, the accuracy of data correction, training time and resource consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Liu, F.T., Kai, M.T., Zhou, Z.H.: Isolation forest. In: Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2009)

    Google Scholar 

  2. Wang, C., Xiao, Z.: A data cleaning model for electric power big data based on Spark framework. Electr. Meas. Instrum., 33–38 (2017)

    Google Scholar 

  3. Guo, A., Zhang, N., Sun, T.: Research on exception data cleaning method based on clustering in Hadoop platform. In: International Symposium on Computational Intelligence and Design, pp. 316–320 (2017)

    Google Scholar 

  4. Pruengkarn, R., Wong, K.W., Fung, C.C.: Data cleaning using complementary fuzzy support vector machine technique. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016, Part II. LNCS, vol. 9948, pp. 160–167. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46672-9_19

    Chapter  Google Scholar 

  5. Chuck, C., Laskey, M., Krishnan, S., et al.: Statistical data cleaning for deep learning of automation tasks from demonstrations. In: IEEE Conference on Automation Science and Engineering, pp. 1142–1149. IEEE (2017)

    Google Scholar 

  6. Qin, H.: A data cleaning method based on genetic algorithm and neural network. Comput. Eng. Appl. 40(3), 45–46 (2004)

    Google Scholar 

  7. Dara, R., Satyanarayana, D.C., Govardhan, D.A.: Front end data cleaning and transformation in standard printed form using neural models. Int. J. Comput. Sci. Appl. 3(6), 9–19 (2014)

    Google Scholar 

  8. Liu, F.T., Kai, M.T., Zhou, Z.H.: Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data 6(1), 1–39 (2012)

    Article  Google Scholar 

  9. Xu, R., Fang, L., Zhao, D., et al.: Electricity consumption prediction based on LSTM neural networks. Power Syst. Big Data 20(8), 25–29 (2017)

    Google Scholar 

  10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. In: Supervised Sequence Labelling with Recurrent Neural Networks, pp. 1735–1780. Springer, Heidelberg (1997)

    Article  Google Scholar 

  11. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)

    Article  Google Scholar 

  12. Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24797-2

    Book  MATH  Google Scholar 

  13. Zhang, Y.: Long short-term memory with attention and multi-task learning for distant speech recognition. In: NCMMSC 2017, p. 5 (2017)

    Google Scholar 

  14. Liu, F., Hao, W., Chen, G., et al.: Attention of bilinear function based Bi-LSTM model for machine reading comprehension. Comput. Sci. 44(s1), 92–96 (2017)

    Google Scholar 

  15. Lu, C.: Research on the attention mechanism-based bidirectional LSTM model for the sentiment classification of Chinese product reviews. Softw. Eng. 20(11), 4–6 (2017)

    Google Scholar 

  16. van den Berg, R.A., Hoefsloot, H.C., Westerhuis, J.A., et al.: Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genom. 7(1), 142 (2006)

    Article  Google Scholar 

  17. Zhou, Z.H.: Machine Learning, pp. 33–35. Tsinghua University Press, Beijing (2016)

    Google Scholar 

Download references

Acknowledgments

This work was supported by Guangdong power grid co., LTD. Technology project funding (GDKJQQ20161191).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to XingNan Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X., Cai, Y., Zhu, W. (2018). Power Data Cleaning Method Based on Isolation Forest and LSTM Neural Network. In: Sun, X., Pan, Z., Bertino, E. (eds) Cloud Computing and Security. ICCCS 2018. Lecture Notes in Computer Science(), vol 11067. Springer, Cham. https://doi.org/10.1007/978-3-030-00018-9_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00018-9_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00017-2

  • Online ISBN: 978-3-030-00018-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics