Abstract
The usage of data to improve information security has become the focus of research in the development of big data and artificial intelligence. However, the traditional single database technology is unable to handle the current complex and massive data processing while network attacks have also occurred frequently and produce massive data. This paper proposes a framework for storing and analyzing Web log data based on the concept of data lakes. The need for processing, accessing, querying, analyzing, and visualizing large amounts of data in a short time is also critical. Therefore, the Grafana tool was used to provide the network administrator a real-time monitoring and visualization system management.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yuan, Xiaoyong, Chuanhuang Li, and Xiaolin Li. 2017. Deepdefense: identifying DDoS attack via deep learning. In 2017 IEEE International Conference on Smart Computing (SMARTCOMP), 1–8. IEEE.
Diro, Abebe Abeshu, and Naveen Chilamkurti. 2018. Distributed attack detection scheme using deep learning approach for internet of things. Future Generation Computer Systems 82: 761–768.
Kozik, Rafał. 2018. Distributing extreme learning machines with apache spark for NetFlow-based malware activity detection. Pattern Recognition Letters 101: 14–20.
Lu, Xiaoyi, Haiyang Shi, M. Rajarshi Biswas, Haseeb Javed, and Dhabaleswar K. Panda. 2018. Dlobd: a comprehensive study of deep learning over big data stacks on HPC clusters. IEEE Transactions on Multi-Scale Computing Systems 4 (4): 635–648.
Ring, Markus, Daniel Schlör, Dieter Landes, and Andreas Hotho. 2019. Flow-based network traffic generation using generative adversarial networks. Computers & Security 82: 156–172.
Yang, Chao-Tung, Jung-Chun Liu, Endah Kristiani, Ming-Lun Liu, Ilsun You, and Giovanni Pau. 2020. NetFlow monitoring and cyberattack detection using deep learning with ceph. IEEE Access 8: 7842–7850.
Munshi, Amr A., and Yasser Abdel-Rady I. Mohamed. 2018. Data lake lambda architecture for smart grids big data analytics. IEEE Access 6: 40463–40471.
Yang, Chao-Tung, Shuo-Tsung Chen, Wei-Hsun Cheng, Yu-Wei Chan, and Endah Kristiani. 2019. A heterogeneous cloud storage platform with uniform data distribution by software-defined storage technologies. IEEE Access 7: 147672–147682.
Tsung, Chen-Kun, Hsiang-Yi Hsieh, and Chao-Tung Yang. 2019. An implementation of scalable high throughput data platform for logging semiconductor testing results. IEEE Access 7: 26497–26506.
Carcillo, Fabrizio, Andrea Dal Pozzolo, Yann-Aël Le Borgne, Olivier Caelen, Yannis Mazzer, and Gianluca Bontempi. 2018. Scarff: a scalable framework for streaming credit card fraud detection with spark. Information Fusion 41: 182–194.
Cruz, Leonel, Ruben Tous, and Beatriz Otero. 2019. Distributed training of deep neural networks with spark: the MareNostrum experience. Pattern Recognition Letters 125: 174–178.
Liu, Hongyu, Bo Lang, Ming Liu, and Hanbing Yan. 2019. CNN and RNN based payload classification methods for attack detection. Knowledge-Based Systems 163: 332–341.
Terzi, Duygu Sinanc, Ramazan Terzi, and Seref Sagiroglu. 2017. Big data analytics for network anomaly detection from netflow data. In 2017 International Conference on Computer Science and Engineering (UBMK), 592–597. IEEE.
Hofstede, Rick, Pavel Čeleda, Brian Trammell, Idilio Drago, Ramin Sadre, Anna Sperotto, and Aiko Pras. 2014. Flow monitoring explained: from packet capture to data analysis with NetFlow and IPFIX. IEEE Communications Surveys & Tutorials 16 (4): 2037–2064.
Acknowledgements
This work was sponsored by the Ministry of Science and Technology (MOST), Taiwan, under Grant No. 108-2622-E-029-007-CC3, 108-2221-E-029-010, and 108-2745-8-029-007.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jiang, CT., Yang, CT., Chan, YW., Kristiani, E., Liu, JC. (2021). Using Spark Distributed Deep Learning to Analyze NetFlow in Data Lake System. In: Chang, JW., Yen, N., Hung, J.C. (eds) Frontier Computing. FC 2020. Lecture Notes in Electrical Engineering, vol 747. Springer, Singapore. https://doi.org/10.1007/978-981-16-0115-6_20
Download citation
DOI: https://doi.org/10.1007/978-981-16-0115-6_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0114-9
Online ISBN: 978-981-16-0115-6
eBook Packages: Computer ScienceComputer Science (R0)