A novel semi-supervised learning method for Internet application identification

Chen, Zhenxiang; Liu, Zhusong; Peng, Lizhi; Wang, Lin; Zhang, Lei

doi:10.1007/s00500-015-1892-1

A novel semi-supervised learning method for Internet application identification

Methodologies and Application
Published: 04 November 2015

Volume 21, pages 1963–1975, (2017)
Cite this article

Soft Computing Aims and scope Submit manuscript

Zhenxiang Chen¹,
Zhusong Liu²,
Lizhi Peng¹,
Lin Wang¹ &
…
Lei Zhang¹

595 Accesses
8 Citations
Explore all metrics

Abstract

Several methods based on port, payload, and transport layer features have been proposed to detect, identify, and manage Internet traffic. The diminished effectiveness of port-based identification and overheads of deep packet inspection methods motivated us to identify Internet traffic by combining distinctive flow characteristics with the machine learning method. However, the abundant ground truth Internet traffic, which is important for building a supervised classifier, is difficult to be obtained in real conditions. In this study, we propose a semi-supervised learning method that combines further division of recognition space technique with data gravitation theory. The further division of recognition space classifier is a powerful multi-classification tool that can be helpful for multi-application identification. The data gravitation may reveal the underlying data space structure from unlabeled data, and thus, it is integrated into the classification to develop a better classifier. The experimental results on the real Internet application traffic datasets demonstrate the advantages of our proposed work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Article Open access 19 September 2022

Iqbal H. Sarker

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Bartosz Krawczyk

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Article 21 March 2022

Sumitra Das Guptta, Khandaker Tayef Shahriar, … Iqbal H. Sarker

References

Auld T, Moore AW, Gull SF (2007) Bayesian neural networks for internet traffic classification. IEEE Trans Neural Netw 18(1):223–239
Article Google Scholar
Beitollahi H, Deconinck G (2014) Connectionscore: a statistical technique to resist application-layer ddos attacks. J Ambient Intell Hum Comput 5:425–442
Article Google Scholar
Chen X, Zhang J, Xiang Y, Zhou W (2013) Traffic identification in semi-known network environment. In: Proceedings of the 2013 IEEE 16th international conference on computational science and engineering, IEEE, pp 572–579
Chen Z, Wang H, Abraham A, Grosan C, Yang B, Chen Y, Wang L (2009) Improving neural network classification using further division of recognition space. Int J Innov Comput Inf Control 5(2)
Chiou TW, Tsai SC, Lin YB (2014) Network security management with traffic pattern clustering. Soft Comput 18:1757–1770
Article Google Scholar
Erman J, Arlitt M, Mahanti A (2006) Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMM workshop on mining network data, ACM, pp 281–286
Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007a) Offline/realtime traffic classification using semi-supervised learning. Perform Eval 64(9):1194–1213
Article Google Scholar
Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007b) Semi-supervised network traffic classification. ACM SIGMETRICS Perform Eval Rev 35:369–370
Article Google Scholar
Esposito C, Ficco M, Palmieri F, Castiglione A (2015) Smart cloud storage service selection based on fuzzy logic, theory of evidence and game theory. IEEE Trans Comput. doi:10.1109/TC.2015.2389952
Este A, Gringoli F, Salgarelli L (2009) On the stability of the information carried by traffic flow features at the packet level. ACM Sigcomm Comput Commun Rev 39(3):13–18
Article MATH Google Scholar
Fbrega L, Jov T, Vil P, Marzo JL (2011) A network scheme for tcp elastic traffic with admission control using edge-to-edge per-aggregate measurements in class-based networks. J High Speed Netw 18:15–32
Google Scholar
Ficco M, Palmieri F, Castiglione A (2015) Modeling security requirements for cloud-based system development. Concurr Comput Pract Exp 27:2107–2124
Article Google Scholar
Galperin EA (2011) Information transmittal, relativity and gravitation. Comput Math Appl 61(6):1517–1535
Article MathSciNet MATH Google Scholar
Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298
Article Google Scholar
Gmez J, Gil C, Banos R, Mrquez AL, Montoya FG, Montoya MG (2013) A pareto-based multi-objective evolutionary algorithm for automatic rule generation in network intrusion detection systems. Soft Comput 17:255–263
Article Google Scholar
Hrubeš P (2012) On the nonnegative rank of distance matrices. Inf Process Lett 112(11):457–461
Article MathSciNet MATH Google Scholar
Iliofotou M, Hc Kim, Faloutsos M, Mitzenmacher M, Pappu P, Varghese G (2011) Graption: a graph-based p2p traffic classification framework for the internet backbone. Comput Netw 55(8):1909–1920
Article Google Scholar
Imai S, Leibnitz K, Murata M (2013) Energy efficient data caching for content dissemination networks. J High Speed Netw 19:215–235
Google Scholar
Indulska M, Orlowska ME (2002) Gravity based spatial clustering. In: Proceedings of the 10th ACM international symposium on advances in geographic information systems, ACM, pp 125–130
Karagiannis T, Broido A, Faloutsos M, Claffy K (2004) Transport layer identification of p2p traffic. In: Imc ’04 proceedings of ACM Sigcomm conference on internet measurement
Karagiannis T, Papagiannaki K, Faloutsos M (2005) Blinc: multilevel traffic classification in the dark. Proc ACM Sigcomm 35(4):229–240
Article Google Scholar
Lakhina A, Crovella M, Diot C (2005) Mining anomalies using traffic feature distributions. ACM Sigcomm 35(4):217–228
Article Google Scholar
Li H, Zhang T, Qiu R, Ma L (2012) Grammar-based semi-supervised incremental learning in automatic speech recognition and labeling. Energy Proc 17:1843–1849
Article Google Scholar
Lin G, Xin Y, Niu X, Jiang H (2014) Network traffic classification based on semi-supervised clustering. J China Univ Posts Telecommun 17:1257–1270
Lu W, Rammidi G, Ghorbani AA (2011) Clustering botnet communication traffic based on n-gram feature selection. Comput Commun 34(3):502–514
Article Google Scholar
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, Oakland, pp 281–297
McGregor A, Hall M, Lorier P, Brunskill J (2004) Flow clustering using machine learning techniques. In: Passive and active network measurement. Springer, Berlin, pp 205–214
Nguyen TTT, Armitage G (2008) A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surveys Tutor 10(4):56–76
Article Google Scholar
Ohzahata S, Hagiwara Y, Terada M, Kawashima K (2005) A traffic identification method and evaluations for a pure p2p application. Lecture Notes in Computer Science, pp 55–68
Peng L, Yang B, Chen Y, Abraham A (2009) Data gravitation based classification. Inf Sci 179(6):809–819
Article MATH Google Scholar
Peng L, Zhang H, Yang B, Chen Y, Wu T (2014) Traffic labeller: collecting internet traffic samples with accurate application information. China Commun 11:69–78
Prieto A, Atencia M, Sandoval F (2013) Advances in artificial neural networks and machine learning. Neurocomputing 121
Qian F, Gm Hu, Xm Yao (2008) Semi-supervised internet network traffic classification using a gaussian mixture model. AEU Int J Electron Commun 62(7):557–564
Roughan M, Sen S, Spatscheck O, Duffield N (2004) Class-of-service mapping for qos: a statistical signature-based approach to ip traffic classification. In: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, ACM, pp 135–148
Shao YH, Wang Z, Chen WJ, Deng NY (2013) A regularization for the projection twin support vector machine. Knowl Based Syst 37:203–210
Article Google Scholar
Shi L, Li W, Liu B (2010) Flow-based packet-mode load-balancing for parallel packet switches. J High Speed Netw 17:97–128
Google Scholar
Shi Y, Zhang A (2004) A shrinking-based dimension reduction approach for multi-dimensional analysis. In: Proceedings of 16th international conference on scientific and statistical database management, IEEE, pp 427–428
Tur G, Hakkani-Tür D, Schapire RE (2005) Combining active and semi-supervised learning for spoken language understanding. Speech Commun 45(2):171–186
Article Google Scholar
Upadhyaya SR (2013) Parallel approaches to machine learningła comprehensive survey. J Parallel Distrib Comput 73(3):284C292
WAND (2009) Wits: Waikato internet traffic storage. http://www.wand.net.nz/wits
Yaghini M, Khoshraftar MM, Fallahi M (2013) A hybrid algorithm for artificial neural network training. Eng Appl Artif Intell 26(1):293–301
Article Google Scholar
Yan Y, Chen L, Tjhi WC (2013) Fuzzy semi-supervised co-clustering for text documents. Fuzzy Sets Syst 215:74–89
Article MathSciNet Google Scholar
Ye W, Kyungsan C (2014) Hybrid p2p traffic classification with heuristic rules and machine learning. Soft Comput 18:1815–1827
Yu D, Varadarajan B, Deng L, Acero A (2010) Active learning and semi-supervised learning for speech recognition: a unified framework using the global entropy reduction maximization criterion. Comput Speech Lang 24(3):433–444
Article Google Scholar
Zander S, Nguyen T, Armitage G (2005a) Automated traffic classification and application identification using machine learning. In: The IEEE conference on local computer networks, 30th anniversary, IEEE, pp 250–257
Zander S, Nguyen T, Armitage G (2005b) Self-learning ip traffic classification based on statistical flow characteristics. In: Passive and active network measurement. Springer, Berlin, pp 325–328
Zhang J, Xiang Y, Zhou W, Wang Y (2013) Unsupervised traffic classification using flow statistical properties and ip packet payload. J Comput Syst Sci 79(5):573–585
Article MathSciNet Google Scholar
Zhang J, Chen X, Xiang Y, Wu J (2014) Robust network traffic classification. IEEE/ACM Trans Netw 24:84–88
Google Scholar
Zhu Z, Zhu X, Guo Y, Ye Y, Xue X (2012) Inverse matrix-free incremental proximal support vector machine. Decis Support Syst 53(3):395–405
Article Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China No. 60903176 and No. 61472164, the Natural Science Foundation of Shandong Province No. ZR2014JL042 and the Program for Youth science and technology star foundation of Jinan No. TNK1108.

Author information

Authors and Affiliations

Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, 250022, Shandong, China
Zhenxiang Chen, Lizhi Peng, Lin Wang & Lei Zhang
School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
Zhusong Liu

Authors

Zhenxiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhusong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lizhi Peng
View author publications
You can also search for this author in PubMed Google Scholar
Lin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenxiang Chen.

Ethics declarations

Conflict of interest

None

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Liu, Z., Peng, L. et al. A novel semi-supervised learning method for Internet application identification. Soft Comput 21, 1963–1975 (2017). https://doi.org/10.1007/s00500-015-1892-1

Download citation

Published: 04 November 2015
Issue Date: April 2017
DOI: https://doi.org/10.1007/s00500-015-1892-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A novel semi-supervised learning method for Internet application identification

Abstract

Access this article

Similar content being viewed by others

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Learning from imbalanced data: open challenges and future directions

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel semi-supervised learning method for Internet application identification

Abstract

Access this article

Similar content being viewed by others

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Learning from imbalanced data: open challenges and future directions

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation