Traffic data extraction and labeling for machine learning based attack detection in IoT networks

Gebrye, Hayelom; Wang, Yong; Li, Fagen

doi:10.1007/s13042-022-01765-7

Traffic data extraction and labeling for machine learning based attack detection in IoT networks

Original Article
Published: 05 January 2023

Volume 14, pages 2317–2332, (2023)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

715 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

The fast expansion of the Internet of Things (IoT) networks raises the possibility of further network threats. In today’s world, network traffic analysis has become an increasingly critical and useful tool for monitoring network traffic in general and analyzing attack patterns in particular. A few years ago, distributed denial-of-service attacks on IoT networks were considered the most pressing problem that needed to be addressed. The absence of high-quality datasets is one of the main obstacles to applying DDOS detection systems based on machine learning. Researchers have developed numerous methods to extract and analyze information from recorded files. From a literature review, it is clear that most of these tools share similar drawbacks. In this study, we proposed an intelligent raw network data extractor and labeler tool by incorporating the limitations of the tools that are available to transform PCAP to CSV. To generate and process a high-quality DDOS attack dataset suitable for machine learning models, we employed several data preprocessing operations on the selected network intrusion dataset. To confirm the validity and acceptability of the dataset, we tested different models. Among the models tested, the random forest was the most accurate in detecting the DDOS attack.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning Models for Malicious Traffic Detection in IoT Networks /IoT-23 Dataset/

Machine Learning in Spark for Attack Traffic Classification in IoT Devices Using Protocol Usage Statistics

Benchmarking Classifiers for DDoS Attack Detection in Industrial IoT Networks

Data availability

The data that support the findings of this study are openly available in IEEE Dataport at https://doi.org/10.21227/q70p-q449, reference number [27], and in Mendeley Data at https://doi.org/10.17632/h38nhgcpgk.1, reference number [48].

References

Roopak M, Tian GY, Chambers J (2019) Deep learning models for cyber security in iot networks. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0452–0457. IEEE
Iglesias F, Zseby T (2015) Analysis of network traffic features for anomaly detection. Mach Learn 101(1):59–84
Article MathSciNet Google Scholar
Frank Jr CV (2019) Mirai bot scanner summation prototype
Orebaugh A, Ramirez G, Beale J (2006) Wireshark & Ethereal Network Protocol Analyzer Toolkit, Elsevier
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Wei T, Li X, Stojanovic V (2021) Input-to-state stability of impulsive reaction-diffusion neural networks with infinite distributed delays. Nonlinear Dyn 103(2):1733–1755
Article Google Scholar
Tao H, Li J, Chen Y, Stojanovic V, Yang H (2020) Robust point-to-point iterative learning control with trial-varying initial conditions. IET Control Theory Appl 14(19):3344–3350
Article MathSciNet Google Scholar
Xu Z, Li X, Stojanovic V (2021) Exponential stability of nonlinear state-dependent delayed impulsive systems with applications. Nonlinear Anal Hybrid Syst 42:101088
Article MathSciNet MATH Google Scholar
Gregg B (2004) Chaosreader. http://www.brendangregg.com/chaosreader.html. Accessed 23 Oct 2021
Soderberg W (2010) Extracting Files from a Capture aka Intercepting Files. https://wh1sk3yj4ck.wordpress.com/2010/08/12/extracting-files-from-a-capturefile-aka-intercepting-files/. Accessed 23 Oct 2021
B P, N H (2005) Tcpxtract Home Page. http://tcpxtract.sourceforge.net/. Accessed 23 Oct 2021
Davidoff S, Ham J (2012) Network forensics: tracking hackers through cyberspace vol. 2014. Prentice hall Upper Saddle River
Deck S, Khiabani H (2015) Extracting files from network packet captures. SANS Institute-InfoSec Reading Room
Joshi M, Hadi TH (2015) A review of network traffic analysis and prediction techniques. arXiv preprint arXiv:1507.05722
Alothman B (2019) Raw network traffic data preprocessing and preparation for automatic analysis. In: 2019 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), pp. 1–5. IEEE
Draper-Gil G, Lashkari AH, Mamun MSI, Ghorbani AA (2016) Characterization of encrypted and vpn traffic using time-related. In: Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP), pp. 407–414
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6. Ieee
Kayacık HG, Zincir-Heywood AN, Heywood MI Selecting features for intrusion detection: a feature relevance analysis on kdd 99 benchmark
Flood R A data-driven toolset using containers to generate datasets for network intrusion detection
Mukkavilli SK, Shetty S, Hong L et al (2016) Generation of labelled datasets to quantify the impact of security threats to cloud data centers. J Inf Secur 7(03):172
Google Scholar
Alzahrani S, Hong L (2018) Generation of ddos attack dataset for effective ids development and evaluation. J Inf Secur 9(4):225–241
Google Scholar
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417
Article MATH Google Scholar
Izenman A (2013) Linear discriminant analysis in modern multivariate statistical techniques: 237–280. Springer, New York
Google Scholar
Legendre P, De Cáceres M (2013) Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecol Lett 16(8):951–963
Article Google Scholar
Wenskovitch J, Crandell I, Ramakrishnan N, House L, North C (2017) Towards a systematic combination of dimension reduction and clustering in visual analytics. IEEE Trans Visual Comput Graphics 24(1):131–141
Article Google Scholar
Fujiwara T, Kwon O-H, Ma K-L (2019) Supporting analysis of dimensionality reduction results with contrastive learning. IEEE Trans Visual Comput Graphics 26(1):45–55
Article Google Scholar
Kang H, Ahn DH, Lee GM, Yoo JD, Park KH, Kim HK (2019) IoT Network Intrusion Dataset. https://doi.org/10.21227/q70p-q449
G., H.: Libpcap File Format (2015). https://wiki.wireshark.org/Development/LibpcapFileFormat. Accessed 25 Oct 2021
Holland T (2004) Understanding ips and ids: Using ips and ids together for defense in depth. SANS Institute
Heidemann J, Mirkovic J, Hardaker W, Kallitsis M (2021) Collecting, labeling, and using networking data: the intersection of ai and networking
Fukuda K, Heidemann J, Qadeer A (2017) Detecting malicious activity with dns backscatter over time. IEEE/ACM Trans Netw 25(5):3203–3218
Article Google Scholar
Sorzano COS, Vargas J, Montano AP (2014) A survey of dimensionality reduction techniques. arXiv preprint arXiv:1403.2877
Csubák D, Szücs K, Vörös P, Kiss A (2016) Big data testbed for network attack detection. Acta Polytechnica Hungarica 13(2):47–57
Google Scholar
Cunningham RK, Lippmann RP, Fried DJ, Garfinkel SL, Graf I, Kendall KR, Webster SE, Wyschogrod D, Zissman MA (1999) Evaluating intrusion detection systems without attacking your friends: the 1998 Darpa intrusion detection evaluation. Technical report, Massachusetts Inst Of Tech Lexington Lincoln Lab
Haines JW, Rossey LM, Lippmann RP, Cunningham RK (2001) Extending the darpa off-line intrusion detection evaluations. In: Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX’01, vol. 1, pp. 35–45. IEEE
Lee W, Stolfo SJ (2000) A framework for constructing features and models for intrusion detection systems. ACM Transact Inform Syst Secur (TiSSEC) 3(4):227–261
Article Google Scholar
Sperotto A, Sadre R, Vliet Fv, Pras A (2009) A labeled data set for flow-based intrusion detection. In: International Workshop on IP Operations and Management, pp. 39–50. Springer
Sangster B, O’Connor T, Cook T, Fanelli R, Dean E, Morrell C, Conti GJ (2009) Toward instrumenting network warfare competitions to generate labeled datasets. In: CSET
Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur 31(3):357–374
Article Google Scholar
Moustafa N, Slay J (2015) Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS), pp. 1–6. IEEE
Kolias C, Kambourakis G, Stavrou A, Gritzalis S (2015) Intrusion detection in 802.11 networks: empirical evaluation of threats and a public dataset. IEEE Communications Surveys & Tutorials 18(1), 184–208
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1:108–116
Google Scholar
Lashkari AH, Draper-Gil G, Mamun MSI, Ghorbani AA, et al. (2017) Characterization of tor traffic using time based features. In: ICISSp, pp. 253–262
Sharafaldin I, Lashkari AH, Hakak S, Ghorbani AA (2019) Developing realistic distributed denial of service (ddos) attack dataset and taxonomy. In: 2019 International Carnahan Conference on Security Technology (ICCST), pp. 1–8. IEEE
Koroniotis N, Moustafa N, Sitnikova E, Turnbull B (2019) Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset. Futur Gener Comput Syst 100:779–796
Article Google Scholar
Meidan Y, Bohadana M, Mathov Y, Mirsky Y, Shabtai A, Breitenbacher D, Elovici Y (2018) N-baiot–network-based detection of iot botnet attacks using deep autoencoders. IEEE Pervasive Comput 17(3):12–22
Article Google Scholar
Ullah I, Mahmoud QH (2020) A scheme for generating a dataset for anomalous activity detection in iot networks. In: Canadian Conference on Artificial Intelligence, pp. 508–520. Springer
Hayelom G (2022) Mirai Based DDOS Dataset. https://doi.org/10.17632/h38nhgcpgk.1
Guo C, Berkhahn F (2016) Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737
Haykin S, Network N (2004) A comprehensive foundation. Neural Netw 2(2004):41
Google Scholar
Xue F, Qu A (2017) Variable selection for highly correlated predictors. arXiv preprint arXiv:1709.04840
Kotsiantis SB, Zaharakis I, Pintelas P et al (2007) Supervised machine learning: a review of classification techniques. Emerging artificial intelligence applications in computer engineering 160(1):3–24
Google Scholar
Friedman J, Hastie T, Tibshirani R, et al. (2001) The elements of statistical learning vol. 1. Springer series in statistics New York
Hussain J, Lalmuanawma S (2016) Feature analysis, evaluation and comparisons of classification algorithms based on noisy intrusion dataset. Proc Comput Sci 92:188–198
Article Google Scholar
Fenanir S, Semchedine F, Baadache A (2019) A machine learning-based lightweight intrusion detection system for the internet of things. Rev. d’Intelligence Artif. 33(3):203–211
Google Scholar
Abrar I, Ayub Z, Masoodi F, Bamhdi AM (2020) A machine learning approach for intrusion detection system on nsl-kdd dataset. In: 2020 International Conference on Smart Electronics and Communication (ICOSEC), pp. 919–924. IEEE
Ashraf S, Ahmed T (2020) Sagacious intrusion detection strategy in sensor network. In: 2020 International Conference on UK-China Emerging Technologies (UCET), pp. 1–4. IEEE
Belavagi MC, Muniyal B (2016) Performance evaluation of supervised machine learning algorithms for intrusion detection. Proc Comput Sci 89:117–123
Article Google Scholar
Ullah S, Ahmad J, Khan MA, Alkhammash EH, Hadjouni M, Ghadi YY, Saeed F, Pitropakis N (2022) A new intrusion detection system for the internet of things via deep convolutional neural network and feature engineering. Sensors 22(10):3607
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China
Hayelom Gebrye, Yong Wang & Fagen Li
Information Technology, Raya University, Maychew, Ethiopia
Hayelom Gebrye

Authors

Hayelom Gebrye
View author publications
You can also search for this author in PubMed Google Scholar
Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fagen Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hayelom Gebrye.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gebrye, H., Wang, Y. & Li, F. Traffic data extraction and labeling for machine learning based attack detection in IoT networks. Int. J. Mach. Learn. & Cyber. 14, 2317–2332 (2023). https://doi.org/10.1007/s13042-022-01765-7

Download citation

Received: 12 February 2022
Accepted: 26 December 2022
Published: 05 January 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s13042-022-01765-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Traffic data extraction and labeling for machine learning based attack detection in IoT networks

Abstract

Access this article

Similar content being viewed by others

Machine Learning Models for Malicious Traffic Detection in IoT Networks /IoT-23 Dataset/

Machine Learning in Spark for Attack Traffic Classification in IoT Devices Using Protocol Usage Statistics

Benchmarking Classifiers for DDoS Attack Detection in Industrial IoT Networks

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Traffic data extraction and labeling for machine learning based attack detection in IoT networks

Abstract

Access this article

Similar content being viewed by others

Machine Learning Models for Malicious Traffic Detection in IoT Networks /IoT-23 Dataset/

Machine Learning in Spark for Attack Traffic Classification in IoT Devices Using Protocol Usage Statistics

Benchmarking Classifiers for DDoS Attack Detection in Industrial IoT Networks

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation