An SVM-based machine learning method for accurate internet traffic classification
Accurate and timely traffic classification is critical in network security monitoring and traffic engineering. Traditional methods based on port numbers and protocols have proven to be ineffective in terms of dynamic port allocation and packet encapsulation. The signature matching methods, on the other hand, require a known signature set and processing of packet payload, can only handle the signatures of a limited number of IP packets in real-time. A machine learning method based on SVM (supporting vector machine) is proposed in this paper for accurate Internet traffic classification. The method classifies the Internet traffic into broad application categories according to the network flow parameters obtained from the packet headers. An optimized feature set is obtained via multiple classifier selection methods. Experimental results using traffic from campus backbone show that an accuracy of 99.42% is achieved with the regular biased training and testing samples. An accuracy of 97.17% is achieved when un-biased training and testing samples are used with the same feature set. Furthermore, as all the feature parameters are computable from the packet headers, the proposed method is also applicable to encrypted network traffic.
KeywordsInternet traffic Network traffic classification Machine learning Feature selection SVM
The research presented in this paper is supported in part by the NSFC (Grant numbers: 60243001, 60574087, 60605019, 60633020) and 863 High Tech Development Plan (Grant numbers: 2007AA01Z475, 2007AA01Z480, 2007AA01Z464).
- Bellotti, T., & Crook, J. (2008). Support vector machines for credit scoring and discovery of significant features. Expert Systems with Applications, to appear.Google Scholar
- Duan, L., Xu, L., Liu, Y., & Lee, J. (2008). Cluster-based outlier detection. Annals of Operations Research, to appear.Google Scholar
- Early, J., Brodley, C., & Rosenberg, C. (2003). Behavioral authentication of server flows. Proceedings of the 19th Annual Computer Security Applications Conference, pp. 46–55.Google Scholar
- Haffner, P., Sen, S., Spatscheck, O., & Wang, D. (2005). ACAS: Automated construction of application signatures. Proceeding of ACM SIGCOMM 2005 Workshops: Conference on Computer Communications, 197–202.Google Scholar
- Kohavi, R. (1995). A Study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1137–1143.Google Scholar
- Lakhina, A., Crovella, M., & Diot, C. (2004). Characterization of network-wide anomalies in traffic flows. Proceedings of the 2004 ACM SIGCOMM Internet Measurement Conference, 201–206.Google Scholar
- Li, L., Valerdi, R., & Warfield, J. (2008). Advances in enterprise information systems. Information Systems Frontiers, to appear.Google Scholar
- Moore, A., & Zuev, D. (2005b). Discriminators for use in flow-based classification. Cambridge: Technical Report, Intel Research.Google Scholar
- Roughan, M., Sen, S., Spatscheck, O., & Duffield, N. (2004). Class-of-service mapping for QoS: A statistical signature-based approach to IP traffic classification. Proceedings of the 2004 ACM SIGCOMM Internet Measurement Conference, 135–148.Google Scholar
- Sen, S., Spatscheck, O., & Wang, D. (2004). Accurate, scalable in-network identification of P2P traffic using application signatures. Thirteenth International World Wide Web Conference Proceedings, 512–521.Google Scholar
- Sourceforge Application Layer Packet Classifier for Linux (2006). Application Layer Packet Classifier for Linux. Retrieved in 2006, from http://l7-filter.sourceforge.net.
- Vigna, G., Robertson, W., & Balzarotti, D. (2004). Testing network-based intrusion detection signatures using mutant exploits. Proceedings of the 11th ACM Conference on Computer and Communications Security, 21–30.Google Scholar