Skip to main content

Multi-stage Feature Selection for On-Line Flow Peer-to-Peer Traffic Identification

  • Conference paper
  • First Online:
Modeling, Design and Simulation of Systems (AsiaSim 2017)

Abstract

Classification of bandwidth-heavy Internet traffic is important for network administrators to throttle network of heavy-bandwidth applications traffic. Statistical methods have been previously proposed as promising method to identify Internet traffic based on packet statistical features. The selection of statistical features still plays an important role for accurate and timely classification. In this work, we propose an approach based on feature selection methods and analytic methods (scatter, one-way analysis of variance) in order to provide optimal features for on-line P2P traffic detection. Feature selection algorithms and machine learning algorithms were implemented using WEKA tool for available traces from University of Brescia, University of Aalborg and University of Cambridge. Experimental results show that the proposed method is able to achieve up to 99.5% accuracy with just six on-line statistical features. These results perform better than other existing approaches in term of accuracy and the number of features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Auld, T., Moore, A.W., Gull, S.F.: Bayesian neural networks for internet traffic classification. IEEE Trans. Neural Netw. 18(1), 223–239 (2007)

    Article  Google Scholar 

  2. Carela-Español, V., Bujlow, T., Barlet-Ros, P.: Is our ground-truth for traffic classification reliable? In: Faloutsos, M., Kuzmanovic, A. (eds.) PAM 2014. LNCS, vol. 8362, pp. 98–108. Springer, Cham (2014). doi:10.1007/978-3-319-04918-2_10

    Chapter  Google Scholar 

  3. Gringoli, F., Salgarelli, L., Dusi, M., Cascarano, N., Risso, F., et al.: GT: picking up the truth from the ground for internet traffic. ACM SIGCOMM Comput. Commun. Rev. 39(5), 12–18 (2009)

    Article  Google Scholar 

  4. Henchiri, O., Japkowicz, N.: A feature selection and evaluation scheme for computer virus detection. In: 2006 Sixth International Conference on Data Mining, ICDM 2006, pp. 891–895. IEEE (2006)

    Google Scholar 

  5. Jamil, H.A., Mohammed, A., Hamza, A., Nor, S.M., Marsono, M.N.: Selection of on-line features for peer-to-peer network traffic classification. In: Thampi, S., Abraham, A., Pal, S., Rodriguez, J. (eds.) Recent Advances in Intelligent Informatics. AISC, vol. 235, pp. 379–390. Springer, Cham (2014). doi:10.1007/978-3-319-01778-5_39

    Chapter  Google Scholar 

  6. Johnson, D.L., Belding, E.M., Van Stam, G.: Network traffic locality in a rural african village. In: Proceedings of the Fifth International Conference on Information and Communication Technologies and Development, pp. 268–277. ACM (2012)

    Google Scholar 

  7. Jun, L., Shunyi, Z., Shidong, L., Ye, X.: P2P traffic identification technique. In: 2007 International Conference on Computational Intelligence and Security, pp. 37–41. IEEE (2007)

    Google Scholar 

  8. KNIME. https://tech.knime.org/forum/bioinformatics/. Acceced 22 Dec 2016

  9. Kögel, J.: One-way delay measurement based on flow data in large enterprise networks. University of Stuttgart, Institut für Kommunikationsnetze und Rechnersysteme (2013)

    Google Scholar 

  10. Kupper, L.L.: Applied Regression Analysis and Other Multivariate Methods. Duxbury Press, Pacific Grove (1978)

    MATH  Google Scholar 

  11. Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: 1995 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, pp. 388–391. IEEE (1995)

    Google Scholar 

  12. Loo, H.R., Marsono, M.N.: Online network traffic classification with incremental learning. Evol. Syst. 7(2), 129–143 (2016)

    Article  Google Scholar 

  13. Monemi, A., Zarei, R., Marsono, M.N.: Online NetFPGA decision tree statistical traffic classifier. Comput. Commun. 36(12), 1329–1340 (2013)

    Article  Google Scholar 

  14. Moore, A., Zuev, D., Crogan, M.: Discriminators for use in flow-based classification. Queen Mary and Westfield College, Department of Computer Science (2005)

    Google Scholar 

  15. Moore, A.W., Papagiannaki, K.: Toward the accurate identification of network applications. In: Dovrolis, C. (ed.) PAM 2005. LNCS, vol. 3431, pp. 41–54. Springer, Heidelberg (2005). doi:10.1007/978-3-540-31966-5_4

    Chapter  Google Scholar 

  16. Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. In: ACM SIGMETRICS Performance Evaluation Review, vol. 33, pp. 50–60. ACM (2005)

    Google Scholar 

  17. Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Japkowicz, N., Elovici, Y.: Unknown malcode detection and the imbalance problem. J. Comput. Virol. 5(4), 295–308 (2009)

    Article  Google Scholar 

  18. Nguyen, T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. IEEE Commun. Surv. Tutor. 10(4), 56–76 (2008)

    Article  Google Scholar 

  19. Qu, B., Zhang, Z., Zhu, X., Meng, D.: An empirical study of morphing on behavior-based network traffic classification. Secur. Commun. Netw. 8(1), 68–79 (2015)

    Article  Google Scholar 

  20. Schultz, M.G., Eskin, E., Zadok, F., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy, S&P 2001, pp. 38–49. IEEE (2001)

    Google Scholar 

  21. Tahan, G., Rokach, L., Shahar, Y.: Mal-id: automatic malware detection using common segment analysis and meta-features. J. Mach. Learn. Res. 13(Apr), 949–979 (2012)

    MathSciNet  MATH  Google Scholar 

  22. Torres, R.D., Hajjat, M.Y., Rao, S.G., Mellia, M., Munafò, M.M.: Inferring undesirable behavior from P2P traffic analysis. In: ACM SIGMETRICS Performance Evaluation Review, vol. 37, pp. 25–36. ACM (2009)

    Google Scholar 

  23. Van Der Putten, P., Van Someren, M.: A bias-variance analysis of a real world learning problem: the coil challenge 2000. Mach. Learn. 57(1–2), 177–195 (2004)

    Article  MATH  Google Scholar 

  24. Wang, W., Zhang, X., Gombault, S.: Constructing attribute weights from computer audit data for effective intrusion detection. J. Syst. Softw. 82(12), 1974–1981 (2009)

    Article  Google Scholar 

  25. WEKA. http://www.cs.waikato.ac.nz/ml/weka/. Acceced 01 Dec 2016

  26. Yang, Y.X., Wang, R., Liu, Y., Zhou, X.Y.: Solving P2P traffic identification problems via optimized support vector machines. In: 2007 IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2007, pp. 165–171. IEEE (2007)

    Google Scholar 

  27. Zhang, H., Lu, G., Qassrawi, M.T., Zhang, Y., Yu, X.: Feature selection for optimizing traffic classification. Comput. Commun. 35(12), 1457–1471 (2012)

    Article  Google Scholar 

  28. Zhao, J.J., Huang, X.H., Qiong, S., Yan, M.: Real-time feature selection in traffic classification. J. China Univ. Posts Telecommun. 15, 68–72 (2008)

    Article  Google Scholar 

  29. Zhen, L., Qiong, L.: A new feature selection method for internet traffic classification using ML. Phys. Procedia 33, 1338–1345 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bushra Mohammed Ali Abdalla .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Ali Abdalla, B.M., Jamil, H.A., Hamdan, M., Bassi, J.S., Ismail, I., Marsono, M.N. (2017). Multi-stage Feature Selection for On-Line Flow Peer-to-Peer Traffic Identification. In: Mohamed Ali, M., Wahid, H., Mohd Subha, N., Sahlan, S., Md. Yunus, M., Wahap, A. (eds) Modeling, Design and Simulation of Systems. AsiaSim 2017. Communications in Computer and Information Science, vol 752. Springer, Singapore. https://doi.org/10.1007/978-981-10-6502-6_44

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6502-6_44

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6501-9

  • Online ISBN: 978-981-10-6502-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics