An SVM-based machine learning method for accurate internet traffic classification

Yuan, Ruixi; Li, Zhu; Guan, Xiaohong; Xu, Li

doi:10.1007/s10796-008-9131-2

An SVM-based machine learning method for accurate internet traffic classification

Published: 25 July 2008

Volume 12, pages 149–156, (2010)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Ruixi Yuan¹,
Zhu Li¹,
Xiaohong Guan^1,2 &
…
Li Xu^3,4

3240 Accesses
149 Citations
Explore all metrics

Abstract

Accurate and timely traffic classification is critical in network security monitoring and traffic engineering. Traditional methods based on port numbers and protocols have proven to be ineffective in terms of dynamic port allocation and packet encapsulation. The signature matching methods, on the other hand, require a known signature set and processing of packet payload, can only handle the signatures of a limited number of IP packets in real-time. A machine learning method based on SVM (supporting vector machine) is proposed in this paper for accurate Internet traffic classification. The method classifies the Internet traffic into broad application categories according to the network flow parameters obtained from the packet headers. An optimized feature set is obtained via multiple classifier selection methods. Experimental results using traffic from campus backbone show that an accuracy of 99.42% is achieved with the regular biased training and testing samples. An accuracy of 97.17% is achieved when un-biased training and testing samples are used with the same feature set. Furthermore, as all the feature parameters are computable from the packet headers, the proposed method is also applicable to encrypted network traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Traffic Classification Approach Based on Support Vector Machine and Statistic Signature

Internet Traffic Classification Based on Incremental Support Vector Machines

Article 10 February 2018

RETRACTED ARTICLE: Traffic identification and traffic analysis based on support vector machine

Article 16 September 2019

References

Bazi, Y., & Melgani, F. (2006). Toward an optimal SVM classification system for hyperspectral remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 44(11), 3374–3385.
Article Google Scholar
Beheshti, H., Hultman, M., Jung, M., Opoku, R., & Salehi-Sangari, E. (2007). Electronic supply chain management applications by Swedish SMEs. Enterprise Information Systems, 1(2), 255–268.
Article Google Scholar
Bellotti, T., & Crook, J. (2008). Support vector machines for credit scoring and discovery of significant features. Expert Systems with Applications, to appear.
Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., & Salamatian, K. (2006). Traffic classification on the fly. Computer Communication Review, 36(2), 23–26.
Article Google Scholar
Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121–167.
Article Google Scholar
Duan, L., Xu, L., Guo, F., Lee, J., & Yan, B. (2007). A local-density based spatial clustering algorithm with noise. Information Systems, 32(7), 978–986.
Article Google Scholar
Duan, L., Xu, L., Liu, Y., & Lee, J. (2008). Cluster-based outlier detection. Annals of Operations Research, to appear.
Early, J., Brodley, C., & Rosenberg, C. (2003). Behavioral authentication of server flows. Proceedings of the 19th Annual Computer Security Applications Conference, pp. 46–55.
Feng, S., Li, H., & Xu, L. (2001). Knowledge-based systems in China. Knowledge-Based Systems, 14, iii–iv.
Article Google Scholar
Guo, J. (2007). Business-to-business electronic market place selection. Enterprise Information Systems, 1(4), 383–419.
Article Google Scholar
Haffner, P., Sen, S., Spatscheck, O., & Wang, D. (2005). ACAS: Automated construction of application signatures. Proceeding of ACM SIGCOMM 2005 Workshops: Conference on Computer Communications, 197–202.
Huang, C., Liao, H., & Chen, M. (2008). Prediction model building and feature selection with support vector machines in breast cancer diagnosis. Expert Systems with Applications, 34, 578–587.
Article Google Scholar
Kohavi, R. (1995). A Study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1137–1143.
Lakhina, A., Crovella, M., & Diot, C. (2004). Characterization of network-wide anomalies in traffic flows. Proceedings of the 2004 ACM SIGCOMM Internet Measurement Conference, 201–206.
Li, L., Valerdi, R., & Warfield, J. (2008). Advances in enterprise information systems. Information Systems Frontiers, to appear.
Li, L., Warfield, J., Guo, S., Guo, W., & Qi, J. (2007a). Advances in intelligent information processing. Information Systems, 32(7), 941–943.
Article Google Scholar
Li, H., & Xu, L. (2001). Feature space theory-a mathematical foundation for data mining. Knowledge-Based Systems, 14(5–6), 253–257.
Article Google Scholar
Li, W., Zheng, W., & Guan, X. (2007b). Application controlled caching for web servers. Enterprise Information Systems, 1(2), 161–175.
Article Google Scholar
Liu, R., Wang, Y., Baba, T., Masumoto, D., & Nagata, S. (2008). SVM-based active feedback in image retrieval using clustering and unlabeled data. Pattern Recognition, 41, 2645–2655.
Article Google Scholar
Luo, J., Xu, L., Jamont, J. P., Zeng, L., & Shi, Z. (2007). A flood decision support system on agent grid: method and implementation. Enterprise Information Systems, 1(1), 49–68.
Article Google Scholar
Moore, A., & Zuev, D. (2005a). Internet traffic classification using Bayesian analysis techniques. Performance Evaluation Review, 33, 50–60.
Article Google Scholar
Moore, A., & Zuev, D. (2005b). Discriminators for use in flow-based classification. Cambridge: Technical Report, Intel Research.
Google Scholar
Roughan, M., Sen, S., Spatscheck, O., & Duffield, N. (2004). Class-of-service mapping for QoS: A statistical signature-based approach to IP traffic classification. Proceedings of the 2004 ACM SIGCOMM Internet Measurement Conference, 135–148.
Sen, S., Spatscheck, O., & Wang, D. (2004). Accurate, scalable in-network identification of P2P traffic using application signatures. Thirteenth International World Wide Web Conference Proceedings, 512–521.
Shi, Z., Huang, Y., He, Q., Xu, L., Liu, S., Qin, L., et al. (2007). MSMiner-a developing platform for OLAP. Decision Support Systems, 42(4), 2016–2028.
Article Google Scholar
Shi, S., Xu, L., & Liu, B. (1996). Application of artificial neural networks to the nonlinear combined forecasts. Expert Systems, 13(3), 195–201.
Article Google Scholar
Shi, S., Xu, L., & Liu, B. (1999). Improving the accuracy of nonlinear combined forecasting using neural networks. Expert Systems With Applications, 16(1), 49–54.
Article Google Scholar
Shon, T., & Moon, J. (2007). A hybrid machine learning approach to network anomaly detection. Information Sciences, 177, 3799–3821.
Article Google Scholar
Sourceforge Application Layer Packet Classifier for Linux (2006). Application Layer Packet Classifier for Linux. Retrieved in 2006, from http://l7-filter.sourceforge.net.
Vigna, G., Robertson, W., & Balzarotti, D. (2004). Testing network-based intrusion detection signatures using mutant exploits. Proceedings of the 11th ACM Conference on Computer and Communications Security, 21–30.
Wang, S., & Archer, N. (2007). Electronic marketplace definition and classification: literature review and clarification. Enterprise Information Systems, 1(1), 89–112.
Article Google Scholar
Xu, L. (1999). Artificial intelligence applications in China. Expert Systems with Applications, 16(1), 1–2.
Article Google Scholar
Xu, L. (2006). Advances in intelligent information processing. Expert Systems, 23(5), 249–250.
Article Google Scholar
Yan, Z., Wang, Z., & Xie, H. (2008). The application of mutual information-based feature selection and fuzzy LS-SVM-based classifier in motion classification. Computer Methods and Programs in Biomedicine, 90, 275–284.
Article Google Scholar

Download references

Acknowledgements

The research presented in this paper is supported in part by the NSFC (Grant numbers: 60243001, 60574087, 60605019, 60633020) and 863 High Tech Development Plan (Grant numbers: 2007AA01Z475, 2007AA01Z480, 2007AA01Z464).

Author information

Authors and Affiliations

Center for Intelligent and Networked Systems, TNLIST Lab, Tsinghua University, Beijing, 100084, China
Ruixi Yuan, Zhu Li & Xiaohong Guan
MOE KLINNS Lab and SKLMS Lab, Xi’an Jiaotong University, Xi’an, 710049, China
Xiaohong Guan
College of Economics and Management, Beijing Jiaotong University, Beijing, 100044, China
Li Xu
Department of Information Technology and Decision Science, Old Dominion University, Norfolk, VA, 23529, USA
Li Xu

Authors

Ruixi Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Zhu Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohong Guan
View author publications
You can also search for this author in PubMed Google Scholar
Li Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaohong Guan.

Additional information

This paper was processed by Ling Li, R. Valerdi and J Warfield.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, R., Li, Z., Guan, X. et al. An SVM-based machine learning method for accurate internet traffic classification. Inf Syst Front 12, 149–156 (2010). https://doi.org/10.1007/s10796-008-9131-2

Download citation

Published: 25 July 2008
Issue Date: April 2010
DOI: https://doi.org/10.1007/s10796-008-9131-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An SVM-based machine learning method for accurate internet traffic classification

Abstract

Access this article

Similar content being viewed by others

Traffic Classification Approach Based on Support Vector Machine and Statistic Signature

Internet Traffic Classification Based on Incremental Support Vector Machines

RETRACTED ARTICLE: Traffic identification and traffic analysis based on support vector machine

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An SVM-based machine learning method for accurate internet traffic classification

Abstract

Access this article

Similar content being viewed by others

Traffic Classification Approach Based on Support Vector Machine and Statistic Signature

Internet Traffic Classification Based on Incremental Support Vector Machines

RETRACTED ARTICLE: Traffic identification and traffic analysis based on support vector machine

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation