Soft Computing

, Volume 21, Issue 8, pp 1963–1975 | Cite as

A novel semi-supervised learning method for Internet application identification

  • Zhenxiang Chen
  • Zhusong Liu
  • Lizhi Peng
  • Lin Wang
  • Lei Zhang
Methodologies and Application


Several methods based on port, payload, and transport layer features have been proposed to detect, identify, and manage Internet traffic. The diminished effectiveness of port-based identification and overheads of deep packet inspection methods motivated us to identify Internet traffic by combining distinctive flow characteristics with the machine learning method. However, the abundant ground truth Internet traffic, which is important for building a supervised classifier, is difficult to be obtained in real conditions. In this study, we propose a semi-supervised learning method that combines further division of recognition space technique with data gravitation theory. The further division of recognition space classifier is a powerful multi-classification tool that can be helpful for multi-application identification. The data gravitation may reveal the underlying data space structure from unlabeled data, and thus, it is integrated into the classification to develop a better classifier. The experimental results on the real Internet application traffic datasets demonstrate the advantages of our proposed work.


Semi-supervised learning Recognition space Data gravitation Internet traffic classification 


  1. Auld T, Moore AW, Gull SF (2007) Bayesian neural networks for internet traffic classification. IEEE Trans Neural Netw 18(1):223–239CrossRefGoogle Scholar
  2. Beitollahi H, Deconinck G (2014) Connectionscore: a statistical technique to resist application-layer ddos attacks. J Ambient Intell Hum Comput 5:425–442CrossRefGoogle Scholar
  3. Chen X, Zhang J, Xiang Y, Zhou W (2013) Traffic identification in semi-known network environment. In: Proceedings of the 2013 IEEE 16th international conference on computational science and engineering, IEEE, pp 572–579Google Scholar
  4. Chen Z, Wang H, Abraham A, Grosan C, Yang B, Chen Y, Wang L (2009) Improving neural network classification using further division of recognition space. Int J Innov Comput Inf Control 5(2)Google Scholar
  5. Chiou TW, Tsai SC, Lin YB (2014) Network security management with traffic pattern clustering. Soft Comput 18:1757–1770CrossRefGoogle Scholar
  6. Erman J, Arlitt M, Mahanti A (2006) Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMM workshop on mining network data, ACM, pp 281–286Google Scholar
  7. Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007a) Offline/realtime traffic classification using semi-supervised learning. Perform Eval 64(9):1194–1213CrossRefGoogle Scholar
  8. Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007b) Semi-supervised network traffic classification. ACM SIGMETRICS Perform Eval Rev 35:369–370CrossRefGoogle Scholar
  9. Esposito C, Ficco M, Palmieri F, Castiglione A (2015) Smart cloud storage service selection based on fuzzy logic, theory of evidence and game theory. IEEE Trans Comput. doi:10.1109/TC.2015.2389952
  10. Este A, Gringoli F, Salgarelli L (2009) On the stability of the information carried by traffic flow features at the packet level. ACM Sigcomm Comput Commun Rev 39(3):13–18CrossRefMATHGoogle Scholar
  11. Fbrega L, Jov T, Vil P, Marzo JL (2011) A network scheme for tcp elastic traffic with admission control using edge-to-edge per-aggregate measurements in class-based networks. J High Speed Netw 18:15–32Google Scholar
  12. Ficco M, Palmieri F, Castiglione A (2015) Modeling security requirements for cloud-based system development. Concurr Comput Pract Exp 27:2107–2124CrossRefGoogle Scholar
  13. Galperin EA (2011) Information transmittal, relativity and gravitation. Comput Math Appl 61(6):1517–1535MathSciNetCrossRefMATHGoogle Scholar
  14. Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298CrossRefGoogle Scholar
  15. Gmez J, Gil C, Banos R, Mrquez AL, Montoya FG, Montoya MG (2013) A pareto-based multi-objective evolutionary algorithm for automatic rule generation in network intrusion detection systems. Soft Comput 17:255–263CrossRefGoogle Scholar
  16. Hrubeš P (2012) On the nonnegative rank of distance matrices. Inf Process Lett 112(11):457–461MathSciNetCrossRefMATHGoogle Scholar
  17. Iliofotou M, Hc Kim, Faloutsos M, Mitzenmacher M, Pappu P, Varghese G (2011) Graption: a graph-based p2p traffic classification framework for the internet backbone. Comput Netw 55(8):1909–1920CrossRefGoogle Scholar
  18. Imai S, Leibnitz K, Murata M (2013) Energy efficient data caching for content dissemination networks. J High Speed Netw 19:215–235Google Scholar
  19. Indulska M, Orlowska ME (2002) Gravity based spatial clustering. In: Proceedings of the 10th ACM international symposium on advances in geographic information systems, ACM, pp 125–130Google Scholar
  20. Karagiannis T, Broido A, Faloutsos M, Claffy K (2004) Transport layer identification of p2p traffic. In: Imc ’04 proceedings of ACM Sigcomm conference on internet measurementGoogle Scholar
  21. Karagiannis T, Papagiannaki K, Faloutsos M (2005) Blinc: multilevel traffic classification in the dark. Proc ACM Sigcomm 35(4):229–240CrossRefGoogle Scholar
  22. Lakhina A, Crovella M, Diot C (2005) Mining anomalies using traffic feature distributions. ACM Sigcomm 35(4):217–228CrossRefGoogle Scholar
  23. Li H, Zhang T, Qiu R, Ma L (2012) Grammar-based semi-supervised incremental learning in automatic speech recognition and labeling. Energy Proc 17:1843–1849CrossRefGoogle Scholar
  24. Lin G, Xin Y, Niu X, Jiang H (2014) Network traffic classification based on semi-supervised clustering. J China Univ Posts Telecommun 17:1257–1270Google Scholar
  25. Lu W, Rammidi G, Ghorbani AA (2011) Clustering botnet communication traffic based on n-gram feature selection. Comput Commun 34(3):502–514CrossRefGoogle Scholar
  26. MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, Oakland, pp 281–297Google Scholar
  27. McGregor A, Hall M, Lorier P, Brunskill J (2004) Flow clustering using machine learning techniques. In: Passive and active network measurement. Springer, Berlin, pp 205–214Google Scholar
  28. Nguyen TTT, Armitage G (2008) A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surveys Tutor 10(4):56–76CrossRefGoogle Scholar
  29. Ohzahata S, Hagiwara Y, Terada M, Kawashima K (2005) A traffic identification method and evaluations for a pure p2p application. Lecture Notes in Computer Science, pp 55–68Google Scholar
  30. Peng L, Yang B, Chen Y, Abraham A (2009) Data gravitation based classification. Inf Sci 179(6):809–819CrossRefMATHGoogle Scholar
  31. Peng L, Zhang H, Yang B, Chen Y, Wu T (2014) Traffic labeller: collecting internet traffic samples with accurate application information. China Commun 11:69–78Google Scholar
  32. Prieto A, Atencia M, Sandoval F (2013) Advances in artificial neural networks and machine learning. Neurocomputing 121Google Scholar
  33. Qian F, Gm Hu, Xm Yao (2008) Semi-supervised internet network traffic classification using a gaussian mixture model. AEU Int J Electron Commun 62(7):557–564Google Scholar
  34. Roughan M, Sen S, Spatscheck O, Duffield N (2004) Class-of-service mapping for qos: a statistical signature-based approach to ip traffic classification. In: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, ACM, pp 135–148Google Scholar
  35. Shao YH, Wang Z, Chen WJ, Deng NY (2013) A regularization for the projection twin support vector machine. Knowl Based Syst 37:203–210CrossRefGoogle Scholar
  36. Shi L, Li W, Liu B (2010) Flow-based packet-mode load-balancing for parallel packet switches. J High Speed Netw 17:97–128Google Scholar
  37. Shi Y, Zhang A (2004) A shrinking-based dimension reduction approach for multi-dimensional analysis. In: Proceedings of 16th international conference on scientific and statistical database management, IEEE, pp 427–428Google Scholar
  38. Tur G, Hakkani-Tür D, Schapire RE (2005) Combining active and semi-supervised learning for spoken language understanding. Speech Commun 45(2):171–186CrossRefGoogle Scholar
  39. Upadhyaya SR (2013) Parallel approaches to machine learningła comprehensive survey. J Parallel Distrib Comput 73(3):284C292Google Scholar
  40. WAND (2009) Wits: Waikato internet traffic storage.
  41. Yaghini M, Khoshraftar MM, Fallahi M (2013) A hybrid algorithm for artificial neural network training. Eng Appl Artif Intell 26(1):293–301CrossRefGoogle Scholar
  42. Yan Y, Chen L, Tjhi WC (2013) Fuzzy semi-supervised co-clustering for text documents. Fuzzy Sets Syst 215:74–89MathSciNetCrossRefGoogle Scholar
  43. Ye W, Kyungsan C (2014) Hybrid p2p traffic classification with heuristic rules and machine learning. Soft Comput 18:1815–1827Google Scholar
  44. Yu D, Varadarajan B, Deng L, Acero A (2010) Active learning and semi-supervised learning for speech recognition: a unified framework using the global entropy reduction maximization criterion. Comput Speech Lang 24(3):433–444CrossRefGoogle Scholar
  45. Zander S, Nguyen T, Armitage G (2005a) Automated traffic classification and application identification using machine learning. In: The IEEE conference on local computer networks, 30th anniversary, IEEE, pp 250–257Google Scholar
  46. Zander S, Nguyen T, Armitage G (2005b) Self-learning ip traffic classification based on statistical flow characteristics. In: Passive and active network measurement. Springer, Berlin, pp 325–328Google Scholar
  47. Zhang J, Xiang Y, Zhou W, Wang Y (2013) Unsupervised traffic classification using flow statistical properties and ip packet payload. J Comput Syst Sci 79(5):573–585MathSciNetCrossRefGoogle Scholar
  48. Zhang J, Chen X, Xiang Y, Wu J (2014) Robust network traffic classification. IEEE/ACM Trans Netw 24:84–88Google Scholar
  49. Zhu Z, Zhu X, Guo Y, Ye Y, Xue X (2012) Inverse matrix-free incremental proximal support vector machine. Decis Support Syst 53(3):395–405CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Zhenxiang Chen
    • 1
  • Zhusong Liu
    • 2
  • Lizhi Peng
    • 1
  • Lin Wang
    • 1
  • Lei Zhang
    • 1
  1. 1.Shandong Provincial Key Laboratory of Network Based Intelligent ComputingUniversity of JinanJinanChina
  2. 2.School of Computer Science and TechnologyGuangdong University of TechnologyGuangzhouChina

Personalised recommendations