Skip to main content
Log in

A novel semi-supervised learning method for Internet application identification

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Several methods based on port, payload, and transport layer features have been proposed to detect, identify, and manage Internet traffic. The diminished effectiveness of port-based identification and overheads of deep packet inspection methods motivated us to identify Internet traffic by combining distinctive flow characteristics with the machine learning method. However, the abundant ground truth Internet traffic, which is important for building a supervised classifier, is difficult to be obtained in real conditions. In this study, we propose a semi-supervised learning method that combines further division of recognition space technique with data gravitation theory. The further division of recognition space classifier is a powerful multi-classification tool that can be helpful for multi-application identification. The data gravitation may reveal the underlying data space structure from unlabeled data, and thus, it is integrated into the classification to develop a better classifier. The experimental results on the real Internet application traffic datasets demonstrate the advantages of our proposed work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Auld T, Moore AW, Gull SF (2007) Bayesian neural networks for internet traffic classification. IEEE Trans Neural Netw 18(1):223–239

    Article  Google Scholar 

  • Beitollahi H, Deconinck G (2014) Connectionscore: a statistical technique to resist application-layer ddos attacks. J Ambient Intell Hum Comput 5:425–442

    Article  Google Scholar 

  • Chen X, Zhang J, Xiang Y, Zhou W (2013) Traffic identification in semi-known network environment. In: Proceedings of the 2013 IEEE 16th international conference on computational science and engineering, IEEE, pp 572–579

  • Chen Z, Wang H, Abraham A, Grosan C, Yang B, Chen Y, Wang L (2009) Improving neural network classification using further division of recognition space. Int J Innov Comput Inf Control 5(2)

  • Chiou TW, Tsai SC, Lin YB (2014) Network security management with traffic pattern clustering. Soft Comput 18:1757–1770

    Article  Google Scholar 

  • Erman J, Arlitt M, Mahanti A (2006) Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMM workshop on mining network data, ACM, pp 281–286

  • Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007a) Offline/realtime traffic classification using semi-supervised learning. Perform Eval 64(9):1194–1213

    Article  Google Scholar 

  • Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007b) Semi-supervised network traffic classification. ACM SIGMETRICS Perform Eval Rev 35:369–370

    Article  Google Scholar 

  • Esposito C, Ficco M, Palmieri F, Castiglione A (2015) Smart cloud storage service selection based on fuzzy logic, theory of evidence and game theory. IEEE Trans Comput. doi:10.1109/TC.2015.2389952

  • Este A, Gringoli F, Salgarelli L (2009) On the stability of the information carried by traffic flow features at the packet level. ACM Sigcomm Comput Commun Rev 39(3):13–18

    Article  MATH  Google Scholar 

  • Fbrega L, Jov T, Vil P, Marzo JL (2011) A network scheme for tcp elastic traffic with admission control using edge-to-edge per-aggregate measurements in class-based networks. J High Speed Netw 18:15–32

    Google Scholar 

  • Ficco M, Palmieri F, Castiglione A (2015) Modeling security requirements for cloud-based system development. Concurr Comput Pract Exp 27:2107–2124

    Article  Google Scholar 

  • Galperin EA (2011) Information transmittal, relativity and gravitation. Comput Math Appl 61(6):1517–1535

    Article  MathSciNet  MATH  Google Scholar 

  • Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298

    Article  Google Scholar 

  • Gmez J, Gil C, Banos R, Mrquez AL, Montoya FG, Montoya MG (2013) A pareto-based multi-objective evolutionary algorithm for automatic rule generation in network intrusion detection systems. Soft Comput 17:255–263

    Article  Google Scholar 

  • Hrubeš P (2012) On the nonnegative rank of distance matrices. Inf Process Lett 112(11):457–461

    Article  MathSciNet  MATH  Google Scholar 

  • Iliofotou M, Hc Kim, Faloutsos M, Mitzenmacher M, Pappu P, Varghese G (2011) Graption: a graph-based p2p traffic classification framework for the internet backbone. Comput Netw 55(8):1909–1920

    Article  Google Scholar 

  • Imai S, Leibnitz K, Murata M (2013) Energy efficient data caching for content dissemination networks. J High Speed Netw 19:215–235

    Google Scholar 

  • Indulska M, Orlowska ME (2002) Gravity based spatial clustering. In: Proceedings of the 10th ACM international symposium on advances in geographic information systems, ACM, pp 125–130

  • Karagiannis T, Broido A, Faloutsos M, Claffy K (2004) Transport layer identification of p2p traffic. In: Imc ’04 proceedings of ACM Sigcomm conference on internet measurement

  • Karagiannis T, Papagiannaki K, Faloutsos M (2005) Blinc: multilevel traffic classification in the dark. Proc ACM Sigcomm 35(4):229–240

    Article  Google Scholar 

  • Lakhina A, Crovella M, Diot C (2005) Mining anomalies using traffic feature distributions. ACM Sigcomm 35(4):217–228

    Article  Google Scholar 

  • Li H, Zhang T, Qiu R, Ma L (2012) Grammar-based semi-supervised incremental learning in automatic speech recognition and labeling. Energy Proc 17:1843–1849

    Article  Google Scholar 

  • Lin G, Xin Y, Niu X, Jiang H (2014) Network traffic classification based on semi-supervised clustering. J China Univ Posts Telecommun 17:1257–1270

  • Lu W, Rammidi G, Ghorbani AA (2011) Clustering botnet communication traffic based on n-gram feature selection. Comput Commun 34(3):502–514

    Article  Google Scholar 

  • MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, Oakland, pp 281–297

  • McGregor A, Hall M, Lorier P, Brunskill J (2004) Flow clustering using machine learning techniques. In: Passive and active network measurement. Springer, Berlin, pp 205–214

  • Nguyen TTT, Armitage G (2008) A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surveys Tutor 10(4):56–76

    Article  Google Scholar 

  • Ohzahata S, Hagiwara Y, Terada M, Kawashima K (2005) A traffic identification method and evaluations for a pure p2p application. Lecture Notes in Computer Science, pp 55–68

  • Peng L, Yang B, Chen Y, Abraham A (2009) Data gravitation based classification. Inf Sci 179(6):809–819

    Article  MATH  Google Scholar 

  • Peng L, Zhang H, Yang B, Chen Y, Wu T (2014) Traffic labeller: collecting internet traffic samples with accurate application information. China Commun 11:69–78

  • Prieto A, Atencia M, Sandoval F (2013) Advances in artificial neural networks and machine learning. Neurocomputing 121

  • Qian F, Gm Hu, Xm Yao (2008) Semi-supervised internet network traffic classification using a gaussian mixture model. AEU Int J Electron Commun 62(7):557–564

  • Roughan M, Sen S, Spatscheck O, Duffield N (2004) Class-of-service mapping for qos: a statistical signature-based approach to ip traffic classification. In: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, ACM, pp 135–148

  • Shao YH, Wang Z, Chen WJ, Deng NY (2013) A regularization for the projection twin support vector machine. Knowl Based Syst 37:203–210

    Article  Google Scholar 

  • Shi L, Li W, Liu B (2010) Flow-based packet-mode load-balancing for parallel packet switches. J High Speed Netw 17:97–128

    Google Scholar 

  • Shi Y, Zhang A (2004) A shrinking-based dimension reduction approach for multi-dimensional analysis. In: Proceedings of 16th international conference on scientific and statistical database management, IEEE, pp 427–428

  • Tur G, Hakkani-Tür D, Schapire RE (2005) Combining active and semi-supervised learning for spoken language understanding. Speech Commun 45(2):171–186

    Article  Google Scholar 

  • Upadhyaya SR (2013) Parallel approaches to machine learningła comprehensive survey. J Parallel Distrib Comput 73(3):284C292

  • WAND (2009) Wits: Waikato internet traffic storage. http://www.wand.net.nz/wits

  • Yaghini M, Khoshraftar MM, Fallahi M (2013) A hybrid algorithm for artificial neural network training. Eng Appl Artif Intell 26(1):293–301

    Article  Google Scholar 

  • Yan Y, Chen L, Tjhi WC (2013) Fuzzy semi-supervised co-clustering for text documents. Fuzzy Sets Syst 215:74–89

    Article  MathSciNet  Google Scholar 

  • Ye W, Kyungsan C (2014) Hybrid p2p traffic classification with heuristic rules and machine learning. Soft Comput 18:1815–1827

  • Yu D, Varadarajan B, Deng L, Acero A (2010) Active learning and semi-supervised learning for speech recognition: a unified framework using the global entropy reduction maximization criterion. Comput Speech Lang 24(3):433–444

    Article  Google Scholar 

  • Zander S, Nguyen T, Armitage G (2005a) Automated traffic classification and application identification using machine learning. In: The IEEE conference on local computer networks, 30th anniversary, IEEE, pp 250–257

  • Zander S, Nguyen T, Armitage G (2005b) Self-learning ip traffic classification based on statistical flow characteristics. In: Passive and active network measurement. Springer, Berlin, pp 325–328

  • Zhang J, Xiang Y, Zhou W, Wang Y (2013) Unsupervised traffic classification using flow statistical properties and ip packet payload. J Comput Syst Sci 79(5):573–585

    Article  MathSciNet  Google Scholar 

  • Zhang J, Chen X, Xiang Y, Wu J (2014) Robust network traffic classification. IEEE/ACM Trans Netw 24:84–88

    Google Scholar 

  • Zhu Z, Zhu X, Guo Y, Ye Y, Xue X (2012) Inverse matrix-free incremental proximal support vector machine. Decis Support Syst 53(3):395–405

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China No. 60903176 and No. 61472164, the Natural Science Foundation of Shandong Province No. ZR2014JL042 and the Program for Youth science and technology star foundation of Jinan No. TNK1108.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenxiang Chen.

Ethics declarations

Conflict of interest

None

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Liu, Z., Peng, L. et al. A novel semi-supervised learning method for Internet application identification. Soft Comput 21, 1963–1975 (2017). https://doi.org/10.1007/s00500-015-1892-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-015-1892-1

Keywords

Navigation