Adaptive and augmented active anomaly detection on dynamic network traffic streams

Li, Bin; Wang, Yijie; Cheng, Li

doi:10.1631/FITEE.2300244

Adaptive and augmented active anomaly detection on dynamic network traffic streams

自适应增强的动态网络流量主动异常检测

Research Article
Published: 23 March 2024

Volume 25, pages 446–460, (2024)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

57 Accesses
Explore all metrics

Abstract

Active anomaly detection queries labels of sampled instances and uses them to incrementally update the detection model, and has been widely adopted in detecting network attacks. However, existing methods cannot achieve desirable performance on dynamic network traffic streams because (1) their query strategies cannot sample informative instances to make the detection model adapt to the evolving stream and (2) their model updating relies on limited query instances only and fails to leverage the enormous unlabeled instances on streams. To address these issues, we propose an active tree based model, adaptive and augmented active prior-knowledge forest (A³PF), for anomaly detection on network traffic streams. A prior-knowledge forest is constructed using prior knowledge of network attacks to find feature subspaces that better distinguish network anomalies from normal traffic. On one hand, to make the model adapt to the evolving stream, a novel adaptive query strategy is designed to sample informative instances from two aspects: the changes in dynamic data distribution and the uncertainty of anomalies. On the other hand, based on the similarity of instances in the neighborhood, we devise an augmented update method to generate pseudo labels for the unlabeled neighbors of query instances, which enables usage of the enormous unlabeled instances during model updating. Extensive experiments on two benchmarks, CIC-IDS2017 and UNSW-NB15, demonstrate that A³PF achieves significant improvements over previous active methods in terms of the area under the receiver operating characteristic curve (AUC-ROC) (20.9% and 21.5%) and the area under the precision-recall curve (AUC-PR) (44.6% and 64.1%).

摘要

主动异常检测通过查询被采样实例的标签,增量更新检测模型,已被广泛用于检测网络攻击。然而,现有方法不能在动态网络流量上实现预期表现,这是因为:(1)它们的查询策略不能采样具有信息量的网络流量,以使检测模型适应数据分布不断变化的网络流量;(2)它们的模型更新仅依赖于有限的查询流量,不能利用网络流量中巨大的未标记流量。为解决这些问题,提出一种自适应增强的主动先验知识森林模型A3PF,用于网络流量的异常检测。通过利用网络攻击的先验知识,寻找能更好区分异常网络流量和正常网络流量的特征子空间,从而构建先验知识森林模型。一方面,为使模型适应不断变化的网络流量,设计了一种新的自适应查询策略,从动态数据分布的变化和异常的不确定性两个方面对具有信息量的网络流量进行采样。另一方面,基于邻域中网络流量的相似性,设计了一种增强更新方法,为查询流量的未标记邻居生成伪标签,从而在异常检测模型更新过程中能够充分利用大量未标记流量。在CIC-IDS2017和UNSW-NB15这两个入侵检测数据集上的大量实验表明,较之相关方法,A3PF性能显著提升。具体而言,其平均AUC-ROC分别提高20.9%和21.5%,平均AUC-PR分别提高44.6%和64.1%。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

Graph based anomaly detection and description: a survey

Article 05 July 2014

Generation & evaluation of datasets for anomaly-based intrusion detection systems in IoT environments

Article 18 April 2024

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Apruzzese G, Laskov P, Tastemirova A, 2022. SoK: the impact of unlabelled data in cyberthreat detection. IEEE 7^th European Symp on Security and Privacy, p.20-42. https://doi.org/10.1109/EuroSP53844.2022.00010
Beaugnon A, Chifflier P, Bach F, 2017. ILAB: an interactive labelling strategy for intrusion detection. 20^th Int Symp on Research in Attacks, Intrusions, and Defenses, p.120-140. https://doi.org/10.1007/978-3-319-66332-6_6
Bilge L, Dumitras T, 2012. Before we knew it: an empirical study of zero-day attacks in the real world. Proc ACM Conf on Computer and Communications Security, p.833-844. https://doi.org/10.1145/2382196.2382284
Breunig MM, Kriegel HP, Ng RT, et al., 2000. LOF: identifying density-based local outliers. Proc ACM SIGMOD Int Conf on Management of Data, p.93-104. https://doi.org/10.1145/342009.335388
Das S, Islam MR, Jayakodi NK, et al., 2019. Active anomaly detection via ensembles: insights, algorithms, and interpretability. https://arxiv.org/abs/1901.08930
Das S, Wong WK, Dietterich T, et al., 2020. Discovering anomalies by incorporating feedback from an expert. ACM Trans Knowl Disc Data, 14(4):1–32. https://doi.org/10.1145/3396608
Article Google Scholar
Dong S, 2021. Multi class SVM algorithm with active learning for network traffic classification. Expert Syst Appl, 176:114885. https://doi.org/10.1016/j.eswa.2021.114885
Article Google Scholar
Field DA, 1988. Laplacian smoothing and Delaunay triangulations. Commun Appl Numer Methods, 4(6):709–712. https://doi.org/10.1002/cnm.1630040603
Article Google Scholar
Gao Y, Chandra S, Li YF, et al., 2022. SACCOS: a semi-supervised framework for emerging class detection and concept drift adaption over data streams. IEEE Trans Knowl Data Eng, 34(3):1416–1426. https://doi.org/10.1109/TKDE.2020.2993193
Article Google Scholar
Guerra-Manzanares A, Bahsi H, 2023. On the application of active learning for efficient and effective IoT botnet detection. Fut Gener Comput Syst, 141:40–53. https://doi.org/10.1016/j.future.2022.10.024
Article Google Scholar
Hafeez H, Khalil T, 2023. IP spoofing & its detection techniques for the prevention of DoS attacks. Recent Prog Sci Technol, 6:49–57. https://doi.org/10.9734/bpi/rpst/v6/4583C
Article Google Scholar
Hulten G, Spencer L, Domingos P, 2001. Mining time-changing data streams. Proc 7^th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.97-106. https://doi.org/10.1145/502512.502529
Kathareios G, Anghel A, Mate A, et al., 2017. Catch it if you can: real-time network anomaly detection with low false alarm rates. 16^th IEEE IEEE Int Conf on Machine Learning and Applications, p.924-929. https://doi.org/10.1109/ICMLA.2017.00-36
Korycki Ł, Cano A, Krawczyk B, 2019. Active learning with abstaining classifiers for imbalanced drifting data streams. IEEE Int Conf on Big Data, p.2334-2343. https://doi.org/10.1109/BigData47090.2019.9006453
Li B, Wang YJ, Xu KL, et al., 2022. DFAID: density-aware and feature-deviated active intrusion detection over network traffic streams. Comput Secur, 118:102719. https://doi.org/10.1016/j.cose.2022.102719
Article Google Scholar
Liu FT, Ting KM, Zhou ZH, 2008. Isolation forest. Proc 8^th IEEE IEEE Int Conf on Data Mining, p.413-422. https://doi.org/10.1109/ICDM.2008.17
Liu TL, Qi Y, Shi L, et al., 2019. Locate-then-detect: real-time web attack detection via attention-based deep neural networks. Proc 28^th Int Joint Conf on Artificial Intelligence, p.4725-4731.
Mirsky Y, Doitshman T, Elovici Y, et al., 2018. Kitsune: an ensemble of autoencoders for online network intrusion detection. https://arxiv.org/abs/1802.09089
Montiel J, Read J, Bifet A, et al., 2018. Scikit-multiflow: a multi-output streaming framework. J Mach Learn Res, 19(72):1–5.
Google Scholar
Moustafa N, Slay J, 2015a. The significant features of the UNSW-NB15 and the KDD99 data sets for network intrusion detection systems. 4^th Int Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, p.25-31. https://doi.org/10.1109/BADGERS.2015.014
Moustafa N, Slay J, 2015b. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). Military Communications and Information Systems Conf, p.1-6. https://doi.org/10.1109/MilCIS.2015.7348942
Pedregosa F, Varoquaux G, Gramfort A, et al., 2011. Scikitlearn: machine learning in Python. J Mach Learn Res, 12:2825–2830.
MathSciNet Google Scholar
Roshan S, Miche Y, Akusok A, et al., 2018. Adaptive and online network intrusion detection system using clustering and extreme learning machines. J Frankl Inst, 355(4):1752–1779. https://doi.org/10.1016/j.jfranklin.2017.06.006
Article MathSciNet Google Scholar
Sathe S, Aggarwal CC, 2016. Subspace outlier detection in linear time with randomized hashing. IEEE 16^th Int Conf on Data Mining, p.459-468. https://doi.org/10.1109/ICDM.2016.0057
Shahraki A, Abbasi M, Taherkordi A, et al., 2022. A comparative study on online machine learning techniques for network traffic streams analysis. Comput Netw, 207:108836. https://doi.org/10.1016/j.comnet.2022.108836
Article Google Scholar
Shan JC, Zhang H, Liu WK, et al., 2019. Online active learning ensemble framework for drifted data streams. IEEE Trans Neur Netw Learn Syst, 30(2):486–498. https://doi.org/10.1109/TNNLS.2018.2844332
Article Google Scholar
Sharafaldin I, Lashkari AH, Ghorbani AA, 2018. Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proc 4^th Int Conf on Information Systems Security and Privacy, p.108-116. https://doi.org/10.5220/0006639801080116
Siddiqui MA, Stokes JW, Seifert C, et al., 2019. Detecting cyber attacks using anomaly detection with explanations and expert feedback. IEEE Int Conf on Acoustics, Speech and Signal Processing, p.2872-2876. https://doi.org/10.1109/ICASSP.2019.8683212
Veeramachaneni K, Arnaldo I, Korrapati V, et al., 2016. AI²: training a big data machine to defend. IEEE 2^nd Int Conf on Big Data Security on Cloud, IEEE Int Conf on High Performance and Smart Computing, and IEEE Int Conf on Intelligent Data and Security, p.49-54. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS.2016.79
Viegas E, Santin A, Bessani A, et al., 2019. BigFlow: realtime and reliable anomaly-based intrusion detection for high-speed networks. Fut Gener Comput Syst, 93:473–485. https://doi.org/10.1016/j.future.2018.09.051
Article Google Scholar
Wang ZY, Wang YJ, Huang ZY, et al., 2021. Entropy and autoencoder-based outlier detection in mixed-type network traffic data. IEEE Int Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking, p.501-508. https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00075
Wu YH, Fang YZ, Shang SK, et al., 2021. A novel framework for detecting social bots with deep neural networks and active learning. Knowl-Based Syst, 211:106525. https://doi.org/10.1016/j.knosys.2020.106525
Article Google Scholar
Yan XY, Homaifar A, Sarkar M, et al., 2021. A clustering-based framework for classifying data streams. https://arxiv.org/abs/2106.11823
Zhao Y, Nasrullah Z, Li Z, 2019. PyOD: a Python toolbox for scalable outlier detection. J Mach Learn Res, 20:1–7.
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

National Key Laboratory of Parallel and Distributed Computing, College of Computer, National University of Defense Technology, Changsha, 410073, China
Bin Li (李彬) & Yijie Wang (王意洁)
College of System Engineering, National University of Defense Technology, Changsha, 410073, China
Li Cheng (程力)

Authors

Bin Li (李彬)
View author publications
You can also search for this author in PubMed Google Scholar
Yijie Wang (王意洁)
View author publications
You can also search for this author in PubMed Google Scholar
Li Cheng (程力)
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Bin LI designed the research, processed the data, and drafted the paper. Yijie WANG and Li CHENG helped organize the paper. Bin LI, Yijie WANG, and Li CHENG revised and finalized the paper.

Corresponding author

Correspondence to Yijie Wang (王意洁).

Ethics declarations

All the authors declare that they have no conflict of interest.

Additional information

Project supported by the National Science and Technology Major Project (No. 2022ZD0115302), the National Natural Science Foundation of China (No. 61379052), the Science Foundation of Ministry of Education of China (No. 2018A02002), and the Natural Science Foundation for Distinguished Young Scholars of Hunan Province, China (No. 14JJ1026)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, B., Wang, Y. & Cheng, L. Adaptive and augmented active anomaly detection on dynamic network traffic streams. Front Inform Technol Electron Eng 25, 446–460 (2024). https://doi.org/10.1631/FITEE.2300244

Download citation

Received: 08 April 2023
Accepted: 04 July 2023
Published: 23 March 2024
Issue Date: March 2024
DOI: https://doi.org/10.1631/FITEE.2300244

Key words

关键词

CLC number

TP309

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive and augmented active anomaly detection on dynamic network traffic streams

Abstract

摘要

Access this article

Similar content being viewed by others

A Comprehensive Survey of Anomaly Detection Algorithms

Graph based anomaly detection and description: a survey

Generation & evaluation of datasets for anomaly-based intrusion detection systems in IoT environments

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Key words

关键词

CLC number

Navigation

Adaptive and augmented active anomaly detection on dynamic network traffic streams

Abstract

摘要

Access this article

Similar content being viewed by others

A Comprehensive Survey of Anomaly Detection Algorithms

Graph based anomaly detection and description: a survey

Generation & evaluation of datasets for anomaly-based intrusion detection systems in IoT environments

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

CLC number

Search

Navigation