Skip to main content
Log in

An SVM-based machine learning method for accurate internet traffic classification

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Accurate and timely traffic classification is critical in network security monitoring and traffic engineering. Traditional methods based on port numbers and protocols have proven to be ineffective in terms of dynamic port allocation and packet encapsulation. The signature matching methods, on the other hand, require a known signature set and processing of packet payload, can only handle the signatures of a limited number of IP packets in real-time. A machine learning method based on SVM (supporting vector machine) is proposed in this paper for accurate Internet traffic classification. The method classifies the Internet traffic into broad application categories according to the network flow parameters obtained from the packet headers. An optimized feature set is obtained via multiple classifier selection methods. Experimental results using traffic from campus backbone show that an accuracy of 99.42% is achieved with the regular biased training and testing samples. An accuracy of 97.17% is achieved when un-biased training and testing samples are used with the same feature set. Furthermore, as all the feature parameters are computable from the packet headers, the proposed method is also applicable to encrypted network traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Bazi, Y., & Melgani, F. (2006). Toward an optimal SVM classification system for hyperspectral remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 44(11), 3374–3385.

    Article  Google Scholar 

  • Beheshti, H., Hultman, M., Jung, M., Opoku, R., & Salehi-Sangari, E. (2007). Electronic supply chain management applications by Swedish SMEs. Enterprise Information Systems, 1(2), 255–268.

    Article  Google Scholar 

  • Bellotti, T., & Crook, J. (2008). Support vector machines for credit scoring and discovery of significant features. Expert Systems with Applications, to appear.

  • Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., & Salamatian, K. (2006). Traffic classification on the fly. Computer Communication Review, 36(2), 23–26.

    Article  Google Scholar 

  • Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121–167.

    Article  Google Scholar 

  • Duan, L., Xu, L., Guo, F., Lee, J., & Yan, B. (2007). A local-density based spatial clustering algorithm with noise. Information Systems, 32(7), 978–986.

    Article  Google Scholar 

  • Duan, L., Xu, L., Liu, Y., & Lee, J. (2008). Cluster-based outlier detection. Annals of Operations Research, to appear.

  • Early, J., Brodley, C., & Rosenberg, C. (2003). Behavioral authentication of server flows. Proceedings of the 19th Annual Computer Security Applications Conference, pp. 46–55.

  • Feng, S., Li, H., & Xu, L. (2001). Knowledge-based systems in China. Knowledge-Based Systems, 14, iii–iv.

    Article  Google Scholar 

  • Guo, J. (2007). Business-to-business electronic market place selection. Enterprise Information Systems, 1(4), 383–419.

    Article  Google Scholar 

  • Haffner, P., Sen, S., Spatscheck, O., & Wang, D. (2005). ACAS: Automated construction of application signatures. Proceeding of ACM SIGCOMM 2005 Workshops: Conference on Computer Communications, 197–202.

  • Huang, C., Liao, H., & Chen, M. (2008). Prediction model building and feature selection with support vector machines in breast cancer diagnosis. Expert Systems with Applications, 34, 578–587.

    Article  Google Scholar 

  • Kohavi, R. (1995). A Study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1137–1143.

  • Lakhina, A., Crovella, M., & Diot, C. (2004). Characterization of network-wide anomalies in traffic flows. Proceedings of the 2004 ACM SIGCOMM Internet Measurement Conference, 201–206.

  • Li, L., Valerdi, R., & Warfield, J. (2008). Advances in enterprise information systems. Information Systems Frontiers, to appear.

  • Li, L., Warfield, J., Guo, S., Guo, W., & Qi, J. (2007a). Advances in intelligent information processing. Information Systems, 32(7), 941–943.

    Article  Google Scholar 

  • Li, H., & Xu, L. (2001). Feature space theory-a mathematical foundation for data mining. Knowledge-Based Systems, 14(5–6), 253–257.

    Article  Google Scholar 

  • Li, W., Zheng, W., & Guan, X. (2007b). Application controlled caching for web servers. Enterprise Information Systems, 1(2), 161–175.

    Article  Google Scholar 

  • Liu, R., Wang, Y., Baba, T., Masumoto, D., & Nagata, S. (2008). SVM-based active feedback in image retrieval using clustering and unlabeled data. Pattern Recognition, 41, 2645–2655.

    Article  Google Scholar 

  • Luo, J., Xu, L., Jamont, J. P., Zeng, L., & Shi, Z. (2007). A flood decision support system on agent grid: method and implementation. Enterprise Information Systems, 1(1), 49–68.

    Article  Google Scholar 

  • Moore, A., & Zuev, D. (2005a). Internet traffic classification using Bayesian analysis techniques. Performance Evaluation Review, 33, 50–60.

    Article  Google Scholar 

  • Moore, A., & Zuev, D. (2005b). Discriminators for use in flow-based classification. Cambridge: Technical Report, Intel Research.

    Google Scholar 

  • Roughan, M., Sen, S., Spatscheck, O., & Duffield, N. (2004). Class-of-service mapping for QoS: A statistical signature-based approach to IP traffic classification. Proceedings of the 2004 ACM SIGCOMM Internet Measurement Conference, 135–148.

  • Sen, S., Spatscheck, O., & Wang, D. (2004). Accurate, scalable in-network identification of P2P traffic using application signatures. Thirteenth International World Wide Web Conference Proceedings, 512–521.

  • Shi, Z., Huang, Y., He, Q., Xu, L., Liu, S., Qin, L., et al. (2007). MSMiner-a developing platform for OLAP. Decision Support Systems, 42(4), 2016–2028.

    Article  Google Scholar 

  • Shi, S., Xu, L., & Liu, B. (1996). Application of artificial neural networks to the nonlinear combined forecasts. Expert Systems, 13(3), 195–201.

    Article  Google Scholar 

  • Shi, S., Xu, L., & Liu, B. (1999). Improving the accuracy of nonlinear combined forecasting using neural networks. Expert Systems With Applications, 16(1), 49–54.

    Article  Google Scholar 

  • Shon, T., & Moon, J. (2007). A hybrid machine learning approach to network anomaly detection. Information Sciences, 177, 3799–3821.

    Article  Google Scholar 

  • Sourceforge Application Layer Packet Classifier for Linux (2006). Application Layer Packet Classifier for Linux. Retrieved in 2006, from http://l7-filter.sourceforge.net.

  • Vigna, G., Robertson, W., & Balzarotti, D. (2004). Testing network-based intrusion detection signatures using mutant exploits. Proceedings of the 11th ACM Conference on Computer and Communications Security, 21–30.

  • Wang, S., & Archer, N. (2007). Electronic marketplace definition and classification: literature review and clarification. Enterprise Information Systems, 1(1), 89–112.

    Article  Google Scholar 

  • Xu, L. (1999). Artificial intelligence applications in China. Expert Systems with Applications, 16(1), 1–2.

    Article  Google Scholar 

  • Xu, L. (2006). Advances in intelligent information processing. Expert Systems, 23(5), 249–250.

    Article  Google Scholar 

  • Yan, Z., Wang, Z., & Xie, H. (2008). The application of mutual information-based feature selection and fuzzy LS-SVM-based classifier in motion classification. Computer Methods and Programs in Biomedicine, 90, 275–284.

    Article  Google Scholar 

Download references

Acknowledgements

The research presented in this paper is supported in part by the NSFC (Grant numbers: 60243001, 60574087, 60605019, 60633020) and 863 High Tech Development Plan (Grant numbers: 2007AA01Z475, 2007AA01Z480, 2007AA01Z464).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohong Guan.

Additional information

This paper was processed by Ling Li, R. Valerdi and J Warfield.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, R., Li, Z., Guan, X. et al. An SVM-based machine learning method for accurate internet traffic classification. Inf Syst Front 12, 149–156 (2010). https://doi.org/10.1007/s10796-008-9131-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-008-9131-2

Keywords

Navigation