Skip to main content
Log in

An Empirical Investigation of Filter Attribute Selection Techniques for High-Speed Network Traffic Flow Classification

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Attribute selection is an important methodology for data mining problems. Removing irrelevant and redundant attributes from original data set can greatly simplify building classifier models. In this paper, we consider applying attribute selection techniques to network traffic flow classification and conduct experiments using the actual network data collected from the Internet of China. The results show that building with an appropriate attribute selection method can simplify the network traffic classifier while achieving satisfactory classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Khoshgoftaar, T. M., Golawala, M., & Van Hulse, J. (2007). An empirical study of learning from imbalanced data using random forest. In 19th IEEE international conference on tools with artificial intelligence (Vol. 2) (ICTAI 2007), Oct 29–31, Paris, France.

  2. Gao, K., Khoshgoftaar, T. M., & Wang, H. (2009). An empirical investigation of filter attribute selection techniques for software quality classification. IEEE IRI 2009, July 10–12, Las Vegas, Nevada, USA.

  3. Saian, R., & Ku-Mahamud, K. R. (2010). Comparison of attribute selection methods for web texts categorization. Computer and network technology (ICCNT), 2010 second international conference on, April 23–25.

  4. Quinlan R. J. (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Mateo, CA

    Google Scholar 

  5. Liu H., Yu L. (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering. 17(4): 491–502

    Article  Google Scholar 

  6. Doak, J. (1992). An evaluation of feature selection methods and their application to computer security. Technical report, Davis CA: University of California, Department of Computer Science.

  7. Narendra P. M., Fukunaga K. (1977) A branch and bound algorithm for feature subset selection. IEEE Transactions on Computer C-26(9): 917–922

    Article  Google Scholar 

  8. Liu, H., & Motoda, H. (1998). Feature extraction, construction and selection: A data mining perspective. Boston, MA: Kluwer Academic.

  9. Dash, M., Choi, K., Scheuermann, P., & Liu, H. (2002). Feature selection for clustering—a filter solution. In Proceedings of the second international conference on data mining (pp. 115–122).

  10. Brassard G., Bratley P. (1996) Fundamentals of algorithms. Prentice Hall, New Jersey

    Google Scholar 

  11. Kohavi R., John G. H. (1997) Wrappers for feature subset selection. Artificial Intelligence 97(1–2): 273–324

    Article  MATH  Google Scholar 

  12. Kohavi R., Rothleder N. J., Simoudis E. (2002) Emerging trends in business analytics. Communications of the Association for Computing Machinery 45(8): 45–48

    Article  Google Scholar 

  13. Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Proceedings of the thirteenth international conference on machine learning (pp. 284–292).

  14. Kononenko, I. (1994). Estimating attributes: Analysis and extension of RELIEF. In Proceedings of the sixth European conference on machine learning(pp. 171–182).

  15. Langley, P. (1994). Selection of relevant features in machine learning. In proceedings of the AAAI fall symposium on relevance (pp. 140–144).

  16. Han J., Kamber M. (2001) Data mining: concepts and techniques. Morgan Kaufman, Los Altos, CA, p 291

    Google Scholar 

  17. Quinlan J. R. (1986) Induction of decision trees. Machine Learning 1: 81–106

    Google Scholar 

  18. Buntine W. (1993) Learning classification trees. Statistics and Computing 2(2): 63–73

    Article  Google Scholar 

  19. Li, B., Xie, S., Qu, Y., Keung, G. Y., Lin, C., Liu, J., & Zhang, X. (2008). Inside the new coolstreaming: Principles, measurements and performance implications. In Proceedings of INFOCOM 2008 (pp. 1031–1039), Phoenix, AZ, USA, April 13–18.

  20. Witten, I. H., & Frank, E. (2006). Data mining: Practical machine learning tools and techniques, 2nd edn (pp. 171, 365, 393). Singapore: Elsevier Pte Ltd.

  21. Chawla, N. V. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (Submitted 09/01, published 06/02).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, J., Ma, J., Cheng, G. et al. An Empirical Investigation of Filter Attribute Selection Techniques for High-Speed Network Traffic Flow Classification. Wireless Pers Commun 66, 541–558 (2012). https://doi.org/10.1007/s11277-012-0735-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-012-0735-y

Keywords

Navigation