Skip to main content

Using Spark Distributed Deep Learning to Analyze NetFlow in Data Lake System

  • Conference paper
  • First Online:
Frontier Computing (FC 2020)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 747))

Included in the following conference series:

  • 121 Accesses

Abstract

The usage of data to improve information security has become the focus of research in the development of big data and artificial intelligence. However, the traditional single database technology is unable to handle the current complex and massive data processing while network attacks have also occurred frequently and produce massive data. This paper proposes a framework for storing and analyzing Web log data based on the concept of data lakes. The need for processing, accessing, querying, analyzing, and visualizing large amounts of data in a short time is also critical. Therefore, the Grafana tool was used to provide the network administrator a real-time monitoring and visualization system management.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yuan, Xiaoyong, Chuanhuang Li, and Xiaolin Li. 2017. Deepdefense: identifying DDoS attack via deep learning. In 2017 IEEE International Conference on Smart Computing (SMARTCOMP), 1–8. IEEE.

    Google Scholar 

  2. Diro, Abebe Abeshu, and Naveen Chilamkurti. 2018. Distributed attack detection scheme using deep learning approach for internet of things. Future Generation Computer Systems 82: 761–768.

    Article  Google Scholar 

  3. Kozik, Rafał. 2018. Distributing extreme learning machines with apache spark for NetFlow-based malware activity detection. Pattern Recognition Letters 101: 14–20.

    Article  Google Scholar 

  4. Lu, Xiaoyi, Haiyang Shi, M. Rajarshi Biswas, Haseeb Javed, and Dhabaleswar K. Panda. 2018. Dlobd: a comprehensive study of deep learning over big data stacks on HPC clusters. IEEE Transactions on Multi-Scale Computing Systems 4 (4): 635–648.

    Article  Google Scholar 

  5. Ring, Markus, Daniel Schlör, Dieter Landes, and Andreas Hotho. 2019. Flow-based network traffic generation using generative adversarial networks. Computers & Security 82: 156–172.

    Article  Google Scholar 

  6. Yang, Chao-Tung, Jung-Chun Liu, Endah Kristiani, Ming-Lun Liu, Ilsun You, and Giovanni Pau. 2020. NetFlow monitoring and cyberattack detection using deep learning with ceph. IEEE Access 8: 7842–7850.

    Article  Google Scholar 

  7. Munshi, Amr A., and Yasser Abdel-Rady I. Mohamed. 2018. Data lake lambda architecture for smart grids big data analytics. IEEE Access 6: 40463–40471.

    Google Scholar 

  8. Yang, Chao-Tung, Shuo-Tsung Chen, Wei-Hsun Cheng, Yu-Wei Chan, and Endah Kristiani. 2019. A heterogeneous cloud storage platform with uniform data distribution by software-defined storage technologies. IEEE Access 7: 147672–147682.

    Article  Google Scholar 

  9. Tsung, Chen-Kun, Hsiang-Yi Hsieh, and Chao-Tung Yang. 2019. An implementation of scalable high throughput data platform for logging semiconductor testing results. IEEE Access 7: 26497–26506.

    Article  Google Scholar 

  10. Carcillo, Fabrizio, Andrea Dal Pozzolo, Yann-Aël Le Borgne, Olivier Caelen, Yannis Mazzer, and Gianluca Bontempi. 2018. Scarff: a scalable framework for streaming credit card fraud detection with spark. Information Fusion 41: 182–194.

    Article  Google Scholar 

  11. Cruz, Leonel, Ruben Tous, and Beatriz Otero. 2019. Distributed training of deep neural networks with spark: the MareNostrum experience. Pattern Recognition Letters 125: 174–178.

    Article  Google Scholar 

  12. Liu, Hongyu, Bo Lang, Ming Liu, and Hanbing Yan. 2019. CNN and RNN based payload classification methods for attack detection. Knowledge-Based Systems 163: 332–341.

    Article  Google Scholar 

  13. Terzi, Duygu Sinanc, Ramazan Terzi, and Seref Sagiroglu. 2017. Big data analytics for network anomaly detection from netflow data. In 2017 International Conference on Computer Science and Engineering (UBMK), 592–597. IEEE.

    Google Scholar 

  14. Hofstede, Rick, Pavel Čeleda, Brian Trammell, Idilio Drago, Ramin Sadre, Anna Sperotto, and Aiko Pras. 2014. Flow monitoring explained: from packet capture to data analysis with NetFlow and IPFIX. IEEE Communications Surveys & Tutorials 16 (4): 2037–2064.

    Article  Google Scholar 

Download references

Acknowledgements

This work was sponsored by the Ministry of Science and Technology (MOST), Taiwan, under Grant No. 108-2622-E-029-007-CC3, 108-2221-E-029-010, and 108-2745-8-029-007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao-Tung Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jiang, CT., Yang, CT., Chan, YW., Kristiani, E., Liu, JC. (2021). Using Spark Distributed Deep Learning to Analyze NetFlow in Data Lake System. In: Chang, JW., Yen, N., Hung, J.C. (eds) Frontier Computing. FC 2020. Lecture Notes in Electrical Engineering, vol 747. Springer, Singapore. https://doi.org/10.1007/978-981-16-0115-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-0115-6_20

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-0114-9

  • Online ISBN: 978-981-16-0115-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics