Abstract
Several methods based on port, payload, and transport layer features have been proposed to detect, identify, and manage Internet traffic. The diminished effectiveness of port-based identification and overheads of deep packet inspection methods motivated us to identify Internet traffic by combining distinctive flow characteristics with the machine learning method. However, the abundant ground truth Internet traffic, which is important for building a supervised classifier, is difficult to be obtained in real conditions. In this study, we propose a semi-supervised learning method that combines further division of recognition space technique with data gravitation theory. The further division of recognition space classifier is a powerful multi-classification tool that can be helpful for multi-application identification. The data gravitation may reveal the underlying data space structure from unlabeled data, and thus, it is integrated into the classification to develop a better classifier. The experimental results on the real Internet application traffic datasets demonstrate the advantages of our proposed work.
Similar content being viewed by others
References
Auld T, Moore AW, Gull SF (2007) Bayesian neural networks for internet traffic classification. IEEE Trans Neural Netw 18(1):223–239
Beitollahi H, Deconinck G (2014) Connectionscore: a statistical technique to resist application-layer ddos attacks. J Ambient Intell Hum Comput 5:425–442
Chen X, Zhang J, Xiang Y, Zhou W (2013) Traffic identification in semi-known network environment. In: Proceedings of the 2013 IEEE 16th international conference on computational science and engineering, IEEE, pp 572–579
Chen Z, Wang H, Abraham A, Grosan C, Yang B, Chen Y, Wang L (2009) Improving neural network classification using further division of recognition space. Int J Innov Comput Inf Control 5(2)
Chiou TW, Tsai SC, Lin YB (2014) Network security management with traffic pattern clustering. Soft Comput 18:1757–1770
Erman J, Arlitt M, Mahanti A (2006) Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMM workshop on mining network data, ACM, pp 281–286
Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007a) Offline/realtime traffic classification using semi-supervised learning. Perform Eval 64(9):1194–1213
Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007b) Semi-supervised network traffic classification. ACM SIGMETRICS Perform Eval Rev 35:369–370
Esposito C, Ficco M, Palmieri F, Castiglione A (2015) Smart cloud storage service selection based on fuzzy logic, theory of evidence and game theory. IEEE Trans Comput. doi:10.1109/TC.2015.2389952
Este A, Gringoli F, Salgarelli L (2009) On the stability of the information carried by traffic flow features at the packet level. ACM Sigcomm Comput Commun Rev 39(3):13–18
Fbrega L, Jov T, Vil P, Marzo JL (2011) A network scheme for tcp elastic traffic with admission control using edge-to-edge per-aggregate measurements in class-based networks. J High Speed Netw 18:15–32
Ficco M, Palmieri F, Castiglione A (2015) Modeling security requirements for cloud-based system development. Concurr Comput Pract Exp 27:2107–2124
Galperin EA (2011) Information transmittal, relativity and gravitation. Comput Math Appl 61(6):1517–1535
Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298
Gmez J, Gil C, Banos R, Mrquez AL, Montoya FG, Montoya MG (2013) A pareto-based multi-objective evolutionary algorithm for automatic rule generation in network intrusion detection systems. Soft Comput 17:255–263
Hrubeš P (2012) On the nonnegative rank of distance matrices. Inf Process Lett 112(11):457–461
Iliofotou M, Hc Kim, Faloutsos M, Mitzenmacher M, Pappu P, Varghese G (2011) Graption: a graph-based p2p traffic classification framework for the internet backbone. Comput Netw 55(8):1909–1920
Imai S, Leibnitz K, Murata M (2013) Energy efficient data caching for content dissemination networks. J High Speed Netw 19:215–235
Indulska M, Orlowska ME (2002) Gravity based spatial clustering. In: Proceedings of the 10th ACM international symposium on advances in geographic information systems, ACM, pp 125–130
Karagiannis T, Broido A, Faloutsos M, Claffy K (2004) Transport layer identification of p2p traffic. In: Imc ’04 proceedings of ACM Sigcomm conference on internet measurement
Karagiannis T, Papagiannaki K, Faloutsos M (2005) Blinc: multilevel traffic classification in the dark. Proc ACM Sigcomm 35(4):229–240
Lakhina A, Crovella M, Diot C (2005) Mining anomalies using traffic feature distributions. ACM Sigcomm 35(4):217–228
Li H, Zhang T, Qiu R, Ma L (2012) Grammar-based semi-supervised incremental learning in automatic speech recognition and labeling. Energy Proc 17:1843–1849
Lin G, Xin Y, Niu X, Jiang H (2014) Network traffic classification based on semi-supervised clustering. J China Univ Posts Telecommun 17:1257–1270
Lu W, Rammidi G, Ghorbani AA (2011) Clustering botnet communication traffic based on n-gram feature selection. Comput Commun 34(3):502–514
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, Oakland, pp 281–297
McGregor A, Hall M, Lorier P, Brunskill J (2004) Flow clustering using machine learning techniques. In: Passive and active network measurement. Springer, Berlin, pp 205–214
Nguyen TTT, Armitage G (2008) A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surveys Tutor 10(4):56–76
Ohzahata S, Hagiwara Y, Terada M, Kawashima K (2005) A traffic identification method and evaluations for a pure p2p application. Lecture Notes in Computer Science, pp 55–68
Peng L, Yang B, Chen Y, Abraham A (2009) Data gravitation based classification. Inf Sci 179(6):809–819
Peng L, Zhang H, Yang B, Chen Y, Wu T (2014) Traffic labeller: collecting internet traffic samples with accurate application information. China Commun 11:69–78
Prieto A, Atencia M, Sandoval F (2013) Advances in artificial neural networks and machine learning. Neurocomputing 121
Qian F, Gm Hu, Xm Yao (2008) Semi-supervised internet network traffic classification using a gaussian mixture model. AEU Int J Electron Commun 62(7):557–564
Roughan M, Sen S, Spatscheck O, Duffield N (2004) Class-of-service mapping for qos: a statistical signature-based approach to ip traffic classification. In: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, ACM, pp 135–148
Shao YH, Wang Z, Chen WJ, Deng NY (2013) A regularization for the projection twin support vector machine. Knowl Based Syst 37:203–210
Shi L, Li W, Liu B (2010) Flow-based packet-mode load-balancing for parallel packet switches. J High Speed Netw 17:97–128
Shi Y, Zhang A (2004) A shrinking-based dimension reduction approach for multi-dimensional analysis. In: Proceedings of 16th international conference on scientific and statistical database management, IEEE, pp 427–428
Tur G, Hakkani-Tür D, Schapire RE (2005) Combining active and semi-supervised learning for spoken language understanding. Speech Commun 45(2):171–186
Upadhyaya SR (2013) Parallel approaches to machine learningła comprehensive survey. J Parallel Distrib Comput 73(3):284C292
WAND (2009) Wits: Waikato internet traffic storage. http://www.wand.net.nz/wits
Yaghini M, Khoshraftar MM, Fallahi M (2013) A hybrid algorithm for artificial neural network training. Eng Appl Artif Intell 26(1):293–301
Yan Y, Chen L, Tjhi WC (2013) Fuzzy semi-supervised co-clustering for text documents. Fuzzy Sets Syst 215:74–89
Ye W, Kyungsan C (2014) Hybrid p2p traffic classification with heuristic rules and machine learning. Soft Comput 18:1815–1827
Yu D, Varadarajan B, Deng L, Acero A (2010) Active learning and semi-supervised learning for speech recognition: a unified framework using the global entropy reduction maximization criterion. Comput Speech Lang 24(3):433–444
Zander S, Nguyen T, Armitage G (2005a) Automated traffic classification and application identification using machine learning. In: The IEEE conference on local computer networks, 30th anniversary, IEEE, pp 250–257
Zander S, Nguyen T, Armitage G (2005b) Self-learning ip traffic classification based on statistical flow characteristics. In: Passive and active network measurement. Springer, Berlin, pp 325–328
Zhang J, Xiang Y, Zhou W, Wang Y (2013) Unsupervised traffic classification using flow statistical properties and ip packet payload. J Comput Syst Sci 79(5):573–585
Zhang J, Chen X, Xiang Y, Wu J (2014) Robust network traffic classification. IEEE/ACM Trans Netw 24:84–88
Zhu Z, Zhu X, Guo Y, Ye Y, Xue X (2012) Inverse matrix-free incremental proximal support vector machine. Decis Support Syst 53(3):395–405
Acknowledgments
This work was supported by the National Natural Science Foundation of China No. 60903176 and No. 61472164, the Natural Science Foundation of Shandong Province No. ZR2014JL042 and the Program for Youth science and technology star foundation of Jinan No. TNK1108.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Chen, Z., Liu, Z., Peng, L. et al. A novel semi-supervised learning method for Internet application identification. Soft Comput 21, 1963–1975 (2017). https://doi.org/10.1007/s00500-015-1892-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-015-1892-1