Abstract
Classification of bandwidth-heavy Internet traffic is important for network administrators to throttle network of heavy-bandwidth applications traffic. Statistical methods have been previously proposed as promising method to identify Internet traffic based on packet statistical features. The selection of statistical features still plays an important role for accurate and timely classification. In this work, we propose an approach based on feature selection methods and analytic methods (scatter, one-way analysis of variance) in order to provide optimal features for on-line P2P traffic detection. Feature selection algorithms and machine learning algorithms were implemented using WEKA tool for available traces from University of Brescia, University of Aalborg and University of Cambridge. Experimental results show that the proposed method is able to achieve up to 99.5% accuracy with just six on-line statistical features. These results perform better than other existing approaches in term of accuracy and the number of features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Auld, T., Moore, A.W., Gull, S.F.: Bayesian neural networks for internet traffic classification. IEEE Trans. Neural Netw. 18(1), 223–239 (2007)
Carela-Español, V., Bujlow, T., Barlet-Ros, P.: Is our ground-truth for traffic classification reliable? In: Faloutsos, M., Kuzmanovic, A. (eds.) PAM 2014. LNCS, vol. 8362, pp. 98–108. Springer, Cham (2014). doi:10.1007/978-3-319-04918-2_10
Gringoli, F., Salgarelli, L., Dusi, M., Cascarano, N., Risso, F., et al.: GT: picking up the truth from the ground for internet traffic. ACM SIGCOMM Comput. Commun. Rev. 39(5), 12–18 (2009)
Henchiri, O., Japkowicz, N.: A feature selection and evaluation scheme for computer virus detection. In: 2006 Sixth International Conference on Data Mining, ICDM 2006, pp. 891–895. IEEE (2006)
Jamil, H.A., Mohammed, A., Hamza, A., Nor, S.M., Marsono, M.N.: Selection of on-line features for peer-to-peer network traffic classification. In: Thampi, S., Abraham, A., Pal, S., Rodriguez, J. (eds.) Recent Advances in Intelligent Informatics. AISC, vol. 235, pp. 379–390. Springer, Cham (2014). doi:10.1007/978-3-319-01778-5_39
Johnson, D.L., Belding, E.M., Van Stam, G.: Network traffic locality in a rural african village. In: Proceedings of the Fifth International Conference on Information and Communication Technologies and Development, pp. 268–277. ACM (2012)
Jun, L., Shunyi, Z., Shidong, L., Ye, X.: P2P traffic identification technique. In: 2007 International Conference on Computational Intelligence and Security, pp. 37–41. IEEE (2007)
KNIME. https://tech.knime.org/forum/bioinformatics/. Acceced 22 Dec 2016
Kögel, J.: One-way delay measurement based on flow data in large enterprise networks. University of Stuttgart, Institut für Kommunikationsnetze und Rechnersysteme (2013)
Kupper, L.L.: Applied Regression Analysis and Other Multivariate Methods. Duxbury Press, Pacific Grove (1978)
Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: 1995 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, pp. 388–391. IEEE (1995)
Loo, H.R., Marsono, M.N.: Online network traffic classification with incremental learning. Evol. Syst. 7(2), 129–143 (2016)
Monemi, A., Zarei, R., Marsono, M.N.: Online NetFPGA decision tree statistical traffic classifier. Comput. Commun. 36(12), 1329–1340 (2013)
Moore, A., Zuev, D., Crogan, M.: Discriminators for use in flow-based classification. Queen Mary and Westfield College, Department of Computer Science (2005)
Moore, A.W., Papagiannaki, K.: Toward the accurate identification of network applications. In: Dovrolis, C. (ed.) PAM 2005. LNCS, vol. 3431, pp. 41–54. Springer, Heidelberg (2005). doi:10.1007/978-3-540-31966-5_4
Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. In: ACM SIGMETRICS Performance Evaluation Review, vol. 33, pp. 50–60. ACM (2005)
Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Japkowicz, N., Elovici, Y.: Unknown malcode detection and the imbalance problem. J. Comput. Virol. 5(4), 295–308 (2009)
Nguyen, T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. IEEE Commun. Surv. Tutor. 10(4), 56–76 (2008)
Qu, B., Zhang, Z., Zhu, X., Meng, D.: An empirical study of morphing on behavior-based network traffic classification. Secur. Commun. Netw. 8(1), 68–79 (2015)
Schultz, M.G., Eskin, E., Zadok, F., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy, S&P 2001, pp. 38–49. IEEE (2001)
Tahan, G., Rokach, L., Shahar, Y.: Mal-id: automatic malware detection using common segment analysis and meta-features. J. Mach. Learn. Res. 13(Apr), 949–979 (2012)
Torres, R.D., Hajjat, M.Y., Rao, S.G., Mellia, M., Munafò, M.M.: Inferring undesirable behavior from P2P traffic analysis. In: ACM SIGMETRICS Performance Evaluation Review, vol. 37, pp. 25–36. ACM (2009)
Van Der Putten, P., Van Someren, M.: A bias-variance analysis of a real world learning problem: the coil challenge 2000. Mach. Learn. 57(1–2), 177–195 (2004)
Wang, W., Zhang, X., Gombault, S.: Constructing attribute weights from computer audit data for effective intrusion detection. J. Syst. Softw. 82(12), 1974–1981 (2009)
WEKA. http://www.cs.waikato.ac.nz/ml/weka/. Acceced 01 Dec 2016
Yang, Y.X., Wang, R., Liu, Y., Zhou, X.Y.: Solving P2P traffic identification problems via optimized support vector machines. In: 2007 IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2007, pp. 165–171. IEEE (2007)
Zhang, H., Lu, G., Qassrawi, M.T., Zhang, Y., Yu, X.: Feature selection for optimizing traffic classification. Comput. Commun. 35(12), 1457–1471 (2012)
Zhao, J.J., Huang, X.H., Qiong, S., Yan, M.: Real-time feature selection in traffic classification. J. China Univ. Posts Telecommun. 15, 68–72 (2008)
Zhen, L., Qiong, L.: A new feature selection method for internet traffic classification using ML. Phys. Procedia 33, 1338–1345 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ali Abdalla, B.M., Jamil, H.A., Hamdan, M., Bassi, J.S., Ismail, I., Marsono, M.N. (2017). Multi-stage Feature Selection for On-Line Flow Peer-to-Peer Traffic Identification. In: Mohamed Ali, M., Wahid, H., Mohd Subha, N., Sahlan, S., Md. Yunus, M., Wahap, A. (eds) Modeling, Design and Simulation of Systems. AsiaSim 2017. Communications in Computer and Information Science, vol 752. Springer, Singapore. https://doi.org/10.1007/978-981-10-6502-6_44
Download citation
DOI: https://doi.org/10.1007/978-981-10-6502-6_44
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6501-9
Online ISBN: 978-981-10-6502-6
eBook Packages: Computer ScienceComputer Science (R0)