Abstract
Machine learning is rapidly becoming one of the most important technology for malware traffic detection, since the continuous evolution of malware requires a constant adaptation and the ability to generalize [20]. However, network traffic datasets are usually oversized and contain redundant and irrelevant information, and this may dramatically increase the computational cost and decrease the accuracy of most classifiers, with the risk to introduce further noise.
We propose two novel dataset optimization strategies which exploit and combine several state-of-the-art approaches in order to achieve an effective optimization of the network traffic datasets used to train malware detectors. The first approach is a feature selection technique based on mutual information measures and sensibility enhancement. The second is a dimensional reduction technique based autoencoders. Both these approaches have been experimentally applied on the MTA-KDD’19 dataset, and the optimized results evaluated and compared using a Multi Layer Perceptron as machine learning model for malware detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akadi, A.E., Ouardighi, A.E., Aboutajdine, D.: A powerful feature selection approach based on mutual information (2008)
Balagani, K.S., Phoha, V.V.: On the feature selection criterion based on an approximation of multidimensional mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1342–1343 (2010)
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5, 537–550 (1994). https://doi.org/10.1109/72.298224
Bennasar, M., Hicks, Y., Setchi, R.: Feature selection using joint mutual information maximisation. Expert Syst. Appl. 42(22), 8520–8532 (2015). https://doi.org/10.1016/j.eswa.2015.07.007
Borges, H.B., Nievola, J.C.: Comparing the dimensionality reduction methods in gene expression databases. Expert Syst. Appl. 39(12), 10780–10795 (2012)
Brown, G., Pocock, A., Zhao, M.J., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)
Colquhoun, D.: An investigation of the false discovery rate and the misinterpretation of p-values. Roy. Soc. Open Sci. 1(3) (2014). https://doi.org/10.1098/rsos.140216
Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5, 1531–1555 (2004)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. STS, vol. 103. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.: Feature extraction: foundations and applications (2006)
Hamon, J.: Optimisation combinatoire pour la sélection de variables en régression en grande dimension : Application en génétique animale. (combinatorial optimization for variable selection in high dimensional regression: Application in animal genetic) (2013)
Han, K., Li, C., Shi, X.: Autoencoder feature selector. ArXiv abs/1710.08310 (2017)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and helmholtz free energy. In: Proceedings of the 6th International Conference on Neural Information Processing Systems, pp. 3–10. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Huang, Y., Xu, D., Nie, F.: Semi-supervised dimension reduction using trace ratio criterion. IEEE Trans. Neural Netw. Learn. Syst. 23(3), 519–526 (2012). https://doi.org/10.1109/TNNLS.2011.2178037
Hughes, G.: On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 14, 55–63 (1968). https://doi.org/10.1109/TIT.1968.1054102
Letteri, I.: MTA-KDD’19 dataset (2019). https://github.com/IvanLetteri/MTA-KDD-19
Letteri, I., Di Cecco, A., Della Penna, G.: Optimized MTA-KDD’19 datasets (2020). https://github.com/IvanLetteri/RRwOptimizedMTAKDD19
Letteri, I., Della Penna, G., Caianiello, P.: Feature selection strategies for HTTP botnet traffic detection. In: 2019 IEEE European Symposium on Security and Privacy Workshops, EuroS&P Workshops 2019, Stockholm, Sweden, 17–19 June 2019, pp. 202–210. IEEE (2019). https://doi.org/10.1109/EuroSPW.2019.00029
Letteri, I., Della Penna, G., De Gasperis, G.: Botnet detection in software defined networks by deep learning techniques. In: Castiglione, A., Pop, F., Ficco, M., Palmieri, F. (eds.) CSS 2018. LNCS, vol. 11161, pp. 49–62. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01689-0_4
Letteri, I., Della Penna, G., De Gasperis, G.: Security in the internet of things: botnet detection in software-defined networks by deep learning techniques. Int. J. High Perf. Comput. Netw. 15(3–4), 170–182 (2020). https://doi.org/10.1504/IJHPCN.2019.106095
Letteri, I., Della Penna, G., Di Vita, L., Grifa, M.T.: Mta-kdd’19: a dataset for malware traffic detection. In: Loreti, M., Spalazzi, L. (eds.) Proceedings of the Fourth Italian Conference on Cyber Security, Ancona, Italy, 4–7 February 2020, CEUR Workshop Proceedings, vol. 2597, pp. 153–165. CEUR-WS.org (2020). http://ceur-ws.org/Vol-2597/paper-14.pdf
Lu, Q., Qiao, X.: Sparse fisher’s linear discriminant analysis for partially labeled data. Stat. Anal. Data Min. 11, 17–31 (2018)
Numpy: numpy.random.uniform. https://numpy.org/numpy.random.uniform.html
Pasunuri, R., Venkaiah, V.C.: A computationally efficient data-dependent projection for dimensionality reduction. In: Bansal, J.C., Gupta, M.K., Sharma, H., Agarwal, B. (eds.) ICCIS 2019. LNNS, vol. 120, pp. 339–352. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3325-9_26
Phuong, T.M., Lin, Z., Altman, R.B.: Choosing SNPs using feature selection. In: Proceedings, IEEE Computational Systems Bioinformatics Conference, pp. 301–309 (2005). https://doi.org/10.1109/csb.2005.22
Scikit-Learn https://scikit-learn.org
Shahana, A.H., Preeja, V.: Survey on feature subset selection for high dimensional data. In: 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT), pp. 1–4 (2016)
Sorzano, C.O.S., Vargas, J., Pascual-Montano, A.D.: A survey of dimensionality reduction techniques. ArXiv abs/1403.2877 (2014)
Wang, G., Lochovsky, F.: Feature selection with conditional mutual information maximin in text categorization, pp. 342–349 (2004). https://doi.org/10.1145/1031171.1031241
Wang, L., Lei, Y., Zeng, Y., Tong, l., Yan, B.: Principal feature analysis: a multivariate feature selection method for fMRI data. Comput. Math. Methods Med. 2013, 645921 (2013). https://doi.org/10.1155/2013/645921
Wang, S., Ding, Z., Fu, Y.: Feature selection guided auto-encoder. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, pp. 2725–2731. AAAI Press (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Letteri, I., Di Cecco, A., Della Penna, G. (2022). New Optimization Approaches in Malware Traffic Analysis. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2021. Lecture Notes in Computer Science(), vol 13163. Springer, Cham. https://doi.org/10.1007/978-3-030-95467-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-95467-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95466-6
Online ISBN: 978-3-030-95467-3
eBook Packages: Computer ScienceComputer Science (R0)