A machine learning approach for feature selection traffic classification using security analysis

Shafiq, Muhammad; Yu, Xiangzhan; Bashir, Ali Kashif; Chaudhry, Hassan Nazeer; Wang, Dawei

doi:10.1007/s11227-018-2263-3

A machine learning approach for feature selection traffic classification using security analysis

Published: 27 January 2018

Volume 74, pages 4867–4892, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Muhammad Shafiq¹,
Xiangzhan Yu¹,
Ali Kashif Bashir²,
Hassan Nazeer Chaudhry³ &
…
Dawei Wang⁴

2105 Accesses
87 Citations
Explore all metrics

Abstract

Class imbalance has become a big problem that leads to inaccurate traffic classification. Accurate traffic classification of traffic flows helps us in security monitoring, IP management, intrusion detection, etc. To address the traffic classification problem, in literature, machine learning (ML) approaches are widely used. Therefore, in this paper, we also proposed an ML-based hybrid feature selection algorithm named WMI_AUC that make use of two metrics: weighted mutual information (WMI) metric and area under ROC curve (AUC). These metrics select effective features from a traffic flow. However, in order to select robust features from the selected features, we proposed robust features selection algorithm. The proposed approach increases the accuracy of ML classifiers and helps in detecting malicious traffic. We evaluate our work using 11 well-known ML classifiers on the different network environment traces datasets. Experimental results showed that our algorithms achieve more than 95% flow accuracy results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning-Based Hybrid Feature Selection for Improvised Network Intrusion Detection

Analysis of Feature Selection Methods for Network Traffic Classification

Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches

Article 06 October 2022

References

Foremski P (2013) On different ways to classify internet traffic? A short review of selected publications. Theor Appl Inform 25(2):119–136
Google Scholar
Moore A, Papagiannaki K (2005) Toward the accurate identification of network applications. Passiv Act Netw Meas 3431:4–54
Google Scholar
Nguyen T, Armitage G (2008) A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surv Tutor 10(4):56–76
Article Google Scholar
Karagiannis T, Broido A, Faloutsos M, Claffy K (2004) Transport layer identification of P2P traffic. In: IMC ’04 Proceedings 4th ACM SIGCOMM Conference Internet Measurement, pp 12–134
Sen S, Spatscheck O, Wang D (2004) Accurate, scalable in-network identification of p2p traffic using application signatures. In: Proceedings 13th International Conference World Wide Web, p 521
Karagiannis T (2004) Application-specific payload bit strings. http://alumni.cs.ucr.edu/~tkarag/papers/strings.txt, 2004. [Online]. http://alumni.cs.ucr.edu/~tkarag/papers/strings.txt. [Toegang verkry: 0Jan-2017]
Haffner P, Sen S, Spatscheck O, Acas DW (2005) Automated construction of application signatures. In: Proceedings 2005 Workshop Mining Network Data, pp 197–202
Moore AW, Zuev D (2005) Internet traffic classification using Bayesian analysis techniques categories and subject descriptors. In: Sigmetrics, pp 50–60
Singh R, Kumar H, Singla R (2013) Sampling based approaches to handle imbalances in network traffic dataset for machine learning techniques. arXiv Prepr. arXiv1311.2677
Labovitz C, Iekel-Johnson S, McPherson D, Oberheide J, Jahanian F (2010) Internet inter-domain traffic. SIGCOMM Computer Communication Review, vol 41
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Maes F, Collignon A, Vandermeulen D, Marchal G, Suetens P (1997) Multimodality image registration by maximization of mutual information. IEEE Trans Med Imaging 16:187
Article Google Scholar
Zhang H, Lu G, Qassrawi MT, Zhang Y, Yu X (2012) Feature selection for optimizing traffic classification. Comput Commun 35(12):1457–1471
Article Google Scholar
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159
Article Google Scholar
Shafiq M, Yu X, Laghari AA (2016) WeChat text messages service flow traffic classification using machine learning technique. In: 2016 6th International Conference IT Convergence and Security ICITCS 2016
Shafiq M, Yu X (2017) Effective packet number for 5G im WeChat application at early stage traffic classification. Mob Inf Syst 2017
Shafiq M et al (2017) WeChat text and picture messages service flow traffic classification using machine learning technique. In: Proceedings—18th IEEE International Conference High Performing Computer Communication 14th IEEE International Conference Smart City 2nd IEEE International Conference Data Science System HPCC/SmartCity/DSS 2016, pp 58–62
Peng L, Zhang H, Yang B, Chen Y, Qassrawi MT, Lu G (2010) Traffic identification using flexible neural trees. In: IEEE International Workshop Quality Servervice IWQoS
Lu G, Zhang H, Sha X, Chen C, Peng L (2010) TCFOM: a robust traffic classification framework based on OC-SVM combined with MC-SVM. In: Proceedings—2010 International Conference Communication Intelligence Information Security ICCIIS 2010, pp 180–186
Auld T, Moore AW, Gull SF (2007) Bayesian neural networks for internet traffic classification. IEEE Trans Neural Netw 18(1):223–239
Article Google Scholar
Cieslak DA, Chawla NV, Striegel A (2006) Combating imbalance in network intrusion datasets. In: IEEE International Conference Granular Computing, pp 732–737
Nechay D, Pointurier Y, Coates M (2009) Controlling false alarm/discovery rates in online internet traffic flow classification. IEEE INFOCOM 2009:684–692
Google Scholar
Li W, Canini M, Moore AW, Bolla R (2009) Efficient application identification and the temporal and spatial stability of classification schema. Comput Netw 53(6):790–809
Article Google Scholar
Gomes DG, Agoulmine N, Bennani Y, de Souza JN (2007) Predictive connectionist approach for VoD bandwidth management. Comput Commun 30(10):2236–2247
Article Google Scholar
Chen X, Wasikowski M (2008) FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems. In: Proceeding 14th ACM SIGKDD International Conference Knowledge Discovery and Data Mining—KDD 08, pp 124–132
Van Der Putten P, Van Someren M (2004) A bias-variance analysis of a real world learning problem: the CoIL challenge 2000. Mach Learn 57(–2):177–195
Article Google Scholar
Lei D, Xiaochun Y, Jun X (2008) Optimizing traffic classification using hybrid feature selection. In: Ninth International Conference Web-Age Information Management, pp 520–525
Zheng Z, Wu X, Srihari R (2004) Feature selection for text categorization on imbalanced data. SIGKDD Explor 6(1):80–89
Article Google Scholar
Lim Y, Kim H, Jeong J, Kim C, Kwon TT, Choi Y (2010) Internet traffic classification demystified: on the sources of the discriminative power. In: Proceedings 6th International Conference, p 9
Kamal AHM, Zhu X, Pandya A, Hsu S (2009) Feature selection with biased sample distributions. In: 2009 IEEE International Conference on Information Reuse and Integration IRI, pp 23–28
Wasikowski M, Chen X (2010) Combating the small sample class imbalance problem using feature selection. IEEE Trans Knowl Data Eng 22(10):1388–1400
Article Google Scholar
Moore A, Zuev D, Crogan M (2005) Discriminators for use in flow-based classification
Peng L, Zhang H, Yang B, Chen Y (2014) Feature evaluation for early stage internet traffic identification. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence Lecture Notes in Bioinformatics), vol 8630. LNCS, pp 51–525
Chapter Google Scholar
Peng L, Yang B, Chen Y, Chen Z (2015) Effectiveness of statistical features for early stage internet traffic identification? Int J Parallel 44:18–197
Google Scholar
Bernaille L, Teixeira R, Akodjenou I, Soule A, Salamatian K (2006) Traffic classification on the fly. ACM SIGCOMM Comput Commun Rev 36(2):23–26
Article Google Scholar
Bahl LB et al (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: ICASSP ’86. International Conference on Acoustics Speech Signal Process, vol 11, pp 49–52
Peng H Mutual information Matlab Toolbox. https://www.mathworks.com/matlabcentral/fileexchange/14888-mutual-information-computation
Peng L, Yang B, Chen Y (2015) Effective packet number for early stage internet traffic identification. Neurocomputing 156:252
Article Google Scholar
WireShark Trace Traffic WireShark, 2015. [Online]. https://www.wireshark.org/. [Toegang verkry: 0Jan-2015]
Introduction to NetMate Tool. [Online]. https://dan.arndt.ca/nims/calculating-flow-statistics-using-netmate/comment-page-1/
Makhoul J, Kubala F, Schwartz R, Weischedel R (1999) Performance measures for information extraction. In: Proceedings DARPA Broadcast News Workshop, pp 249–252

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grant No. 61571144.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
Muhammad Shafiq & Xiangzhan Yu
Faculty of Science and Technology, University of Faroe Islands, Faroe Islands, Denmark
Ali Kashif Bashir
Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
Hassan Nazeer Chaudhry
National Computer Network Emergency Response Technical Team/Coordination Center, Beijing, China
Dawei Wang

Authors

Muhammad Shafiq
View author publications
You can also search for this author in PubMed Google Scholar
Xiangzhan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ali Kashif Bashir
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Nazeer Chaudhry
View author publications
You can also search for this author in PubMed Google Scholar
Dawei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Shafiq.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shafiq, M., Yu, X., Bashir, A.K. et al. A machine learning approach for feature selection traffic classification using security analysis. J Supercomput 74, 4867–4892 (2018). https://doi.org/10.1007/s11227-018-2263-3

Download citation

Published: 27 January 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11227-018-2263-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A machine learning approach for feature selection traffic classification using security analysis

Abstract

Access this article

Similar content being viewed by others

Machine Learning-Based Hybrid Feature Selection for Improvised Network Intrusion Detection

Analysis of Feature Selection Methods for Network Traffic Classification

Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A machine learning approach for feature selection traffic classification using security analysis

Abstract

Access this article

Similar content being viewed by others

Machine Learning-Based Hybrid Feature Selection for Improvised Network Intrusion Detection

Analysis of Feature Selection Methods for Network Traffic Classification

Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation