A Four-Stage Hybrid Feature Subset Selection Approach for Network Traffic Classification Based on Full Coverage
There is significant interest in network management and security to classify traffic flows. As the essential step for machine learning based traffic classification, feature subset selection is often used to realize dimension reduction and redundant information decrease. A four-stage hybrid feature subset selection method is proposed to improve the classification performance of hybrid methods at low evaluation consumption. The proposed algorithm is designed to dispose features in the level of block and evaluate every feature even the remaining ones which cannot provide much information by themselves to use the interactions among all of them. Additionally, a wrapper-based selection is designed in the last stage to further remove the redundant features. The performances are examined by two groups of experiments. Our theoretical analysis and experimental observations reveal that the proposed method selects feature subset with improved classification performance on every index while depleting fewer evaluations. Moreover, the evaluation consumption can keep at a low and stable level with different size of block.
KeywordsFull coverage Machine learning Hybrid feature subset selection Network traffic classification Network management
The authors gratefully acknowledge the financial support from Natural Science Foundation of Zhangzhou, Fujian (Project No. ZZ2018J22).
- 1.Khayari, R.E.A., Sadre, R,, Haverkort, B.R.: A validation of the pseudo self-similar traffic model. In: International Conference on Dependable Systems and Networks, pp. 727–734. IEEE Computer Society (2002)Google Scholar
- 3.Nie, F., Huang, H., Cai, X., et al.: Efficient and robust feature selection via joint ℓ2,1-norms minimization. In: International Conference on Neural Information Processing Systems, pp. 1813–1821. Curran Associates Inc (2010)Google Scholar
- 8.Zhang, L.X,, Wang, J.X., Zhao, Y.N., et al.: A novel hybrid feature selection algorithm: using ReliefF estimation for GA-wrapper search. In: International Conference on Machine Learning and Cybernetics, vol. 1, pp. 380–384. IEEE (2004)Google Scholar
- 9.Bonilla-Huerta, E., Duval, B., Hernández, J.C.H., Hao, J.-K., Morales-Caporal, R.: Hybrid filter-wrapper with a specialized random multi-parent crossover operator for gene selection and classification problems. In: Huang, D.-S., Gan, Y., Premaratne, P., Han, K. (eds.) ICIC 2011. LNCS, vol. 6840, pp. 453–461. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24553-4_60CrossRefGoogle Scholar
- 12.Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc (2000)Google Scholar
- 15.Wald, R., Khoshgoftaar, T.M., Napolitano, A.: Stability of filter- and wrapper-based feature subset selection. In: IEEE International Conference on TOOLS with Artificial Intelligence, pp. 374–380. IEEE (2014)Google Scholar
- 17.Shen, H., Wang, B.: An effective method for synthesizing multiple-pattern linear arrays with a reduced number of antenna elements. IEEE Trans. Antennas Propag. PP(99), 1 (2017)Google Scholar
- 20.Fialho, A.S., et al.: Predicting outcomes of septic shock patients using feature selection based on soft computing techniques. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. CCIS, vol. 81, pp. 65–74. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14058-7_7CrossRefGoogle Scholar
- 22.Bermejo, P., Gamez, J.A., Puerta, J.M.: Incremental Wrapper-based subset selection with replacement: an advantageous alternative to sequential forward selection. In: IEEE Symposium on Computational Intelligence and Data Mining, 2009 (CIDM 2009), pp. 367–374. IEEE (2009)Google Scholar
- 24.Friedman, J., Hastie, T., et al.: The Elements of Statistical Learning, vol. 27, no. 2, pp. 83–85. Springer, Heidelberg (2009)Google Scholar
- 26.Quinlan, J.R.: C4. 5: Programs for Machine Learning. Morgan Kaufmann, Los Altos (1992)Google Scholar
- 27.Moore, A.W.: Dataset. http://www.cl.cam.ac.uk/research/srg/netos/nprobe/data/papers. Accessed Aug 2013