Information Systems Frontiers

, Volume 12, Issue 2, pp 149–156 | Cite as

An SVM-based machine learning method for accurate internet traffic classification

Article

Abstract

Accurate and timely traffic classification is critical in network security monitoring and traffic engineering. Traditional methods based on port numbers and protocols have proven to be ineffective in terms of dynamic port allocation and packet encapsulation. The signature matching methods, on the other hand, require a known signature set and processing of packet payload, can only handle the signatures of a limited number of IP packets in real-time. A machine learning method based on SVM (supporting vector machine) is proposed in this paper for accurate Internet traffic classification. The method classifies the Internet traffic into broad application categories according to the network flow parameters obtained from the packet headers. An optimized feature set is obtained via multiple classifier selection methods. Experimental results using traffic from campus backbone show that an accuracy of 99.42% is achieved with the regular biased training and testing samples. An accuracy of 97.17% is achieved when un-biased training and testing samples are used with the same feature set. Furthermore, as all the feature parameters are computable from the packet headers, the proposed method is also applicable to encrypted network traffic.

Keywords

Internet traffic Network traffic classification Machine learning Feature selection SVM 

References

  1. Bazi, Y., & Melgani, F. (2006). Toward an optimal SVM classification system for hyperspectral remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 44(11), 3374–3385.CrossRefGoogle Scholar
  2. Beheshti, H., Hultman, M., Jung, M., Opoku, R., & Salehi-Sangari, E. (2007). Electronic supply chain management applications by Swedish SMEs. Enterprise Information Systems, 1(2), 255–268.CrossRefGoogle Scholar
  3. Bellotti, T., & Crook, J. (2008). Support vector machines for credit scoring and discovery of significant features. Expert Systems with Applications, to appear.Google Scholar
  4. Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., & Salamatian, K. (2006). Traffic classification on the fly. Computer Communication Review, 36(2), 23–26.CrossRefGoogle Scholar
  5. Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121–167.CrossRefGoogle Scholar
  6. Duan, L., Xu, L., Guo, F., Lee, J., & Yan, B. (2007). A local-density based spatial clustering algorithm with noise. Information Systems, 32(7), 978–986.CrossRefGoogle Scholar
  7. Duan, L., Xu, L., Liu, Y., & Lee, J. (2008). Cluster-based outlier detection. Annals of Operations Research, to appear.Google Scholar
  8. Early, J., Brodley, C., & Rosenberg, C. (2003). Behavioral authentication of server flows. Proceedings of the 19th Annual Computer Security Applications Conference, pp. 46–55.Google Scholar
  9. Feng, S., Li, H., & Xu, L. (2001). Knowledge-based systems in China. Knowledge-Based Systems, 14, iii–iv.CrossRefGoogle Scholar
  10. Guo, J. (2007). Business-to-business electronic market place selection. Enterprise Information Systems, 1(4), 383–419.CrossRefGoogle Scholar
  11. Haffner, P., Sen, S., Spatscheck, O., & Wang, D. (2005). ACAS: Automated construction of application signatures. Proceeding of ACM SIGCOMM 2005 Workshops: Conference on Computer Communications, 197–202.Google Scholar
  12. Huang, C., Liao, H., & Chen, M. (2008). Prediction model building and feature selection with support vector machines in breast cancer diagnosis. Expert Systems with Applications, 34, 578–587.CrossRefGoogle Scholar
  13. Kohavi, R. (1995). A Study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1137–1143.Google Scholar
  14. Lakhina, A., Crovella, M., & Diot, C. (2004). Characterization of network-wide anomalies in traffic flows. Proceedings of the 2004 ACM SIGCOMM Internet Measurement Conference, 201–206.Google Scholar
  15. Li, L., Valerdi, R., & Warfield, J. (2008). Advances in enterprise information systems. Information Systems Frontiers, to appear.Google Scholar
  16. Li, L., Warfield, J., Guo, S., Guo, W., & Qi, J. (2007a). Advances in intelligent information processing. Information Systems, 32(7), 941–943.CrossRefGoogle Scholar
  17. Li, H., & Xu, L. (2001). Feature space theory-a mathematical foundation for data mining. Knowledge-Based Systems, 14(5–6), 253–257.CrossRefGoogle Scholar
  18. Li, W., Zheng, W., & Guan, X. (2007b). Application controlled caching for web servers. Enterprise Information Systems, 1(2), 161–175.CrossRefGoogle Scholar
  19. Liu, R., Wang, Y., Baba, T., Masumoto, D., & Nagata, S. (2008). SVM-based active feedback in image retrieval using clustering and unlabeled data. Pattern Recognition, 41, 2645–2655.CrossRefGoogle Scholar
  20. Luo, J., Xu, L., Jamont, J. P., Zeng, L., & Shi, Z. (2007). A flood decision support system on agent grid: method and implementation. Enterprise Information Systems, 1(1), 49–68.CrossRefGoogle Scholar
  21. Moore, A., & Zuev, D. (2005a). Internet traffic classification using Bayesian analysis techniques. Performance Evaluation Review, 33, 50–60.CrossRefGoogle Scholar
  22. Moore, A., & Zuev, D. (2005b). Discriminators for use in flow-based classification. Cambridge: Technical Report, Intel Research.Google Scholar
  23. Roughan, M., Sen, S., Spatscheck, O., & Duffield, N. (2004). Class-of-service mapping for QoS: A statistical signature-based approach to IP traffic classification. Proceedings of the 2004 ACM SIGCOMM Internet Measurement Conference, 135–148.Google Scholar
  24. Sen, S., Spatscheck, O., & Wang, D. (2004). Accurate, scalable in-network identification of P2P traffic using application signatures. Thirteenth International World Wide Web Conference Proceedings, 512–521.Google Scholar
  25. Shi, Z., Huang, Y., He, Q., Xu, L., Liu, S., Qin, L., et al. (2007). MSMiner-a developing platform for OLAP. Decision Support Systems, 42(4), 2016–2028.CrossRefGoogle Scholar
  26. Shi, S., Xu, L., & Liu, B. (1996). Application of artificial neural networks to the nonlinear combined forecasts. Expert Systems, 13(3), 195–201.CrossRefGoogle Scholar
  27. Shi, S., Xu, L., & Liu, B. (1999). Improving the accuracy of nonlinear combined forecasting using neural networks. Expert Systems With Applications, 16(1), 49–54.CrossRefGoogle Scholar
  28. Shon, T., & Moon, J. (2007). A hybrid machine learning approach to network anomaly detection. Information Sciences, 177, 3799–3821.CrossRefGoogle Scholar
  29. Sourceforge Application Layer Packet Classifier for Linux (2006). Application Layer Packet Classifier for Linux. Retrieved in 2006, from http://l7-filter.sourceforge.net.
  30. Vigna, G., Robertson, W., & Balzarotti, D. (2004). Testing network-based intrusion detection signatures using mutant exploits. Proceedings of the 11th ACM Conference on Computer and Communications Security, 21–30.Google Scholar
  31. Wang, S., & Archer, N. (2007). Electronic marketplace definition and classification: literature review and clarification. Enterprise Information Systems, 1(1), 89–112.CrossRefGoogle Scholar
  32. Xu, L. (1999). Artificial intelligence applications in China. Expert Systems with Applications, 16(1), 1–2.CrossRefGoogle Scholar
  33. Xu, L. (2006). Advances in intelligent information processing. Expert Systems, 23(5), 249–250.CrossRefGoogle Scholar
  34. Yan, Z., Wang, Z., & Xie, H. (2008). The application of mutual information-based feature selection and fuzzy LS-SVM-based classifier in motion classification. Computer Methods and Programs in Biomedicine, 90, 275–284.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Center for Intelligent and Networked Systems, TNLIST LabTsinghua UniversityBeijingChina
  2. 2.MOE KLINNS Lab and SKLMS LabXi’an Jiaotong UniversityXi’anChina
  3. 3.College of Economics and ManagementBeijing Jiaotong UniversityBeijingChina
  4. 4.Department of Information Technology and Decision ScienceOld Dominion UniversityNorfolkUSA

Personalised recommendations