Abstract
Static malware detection is an essential layer in a security suite, which attempts to classify samples as malicious or benign before execution. However, most of the related works incur the scalability issues, for examples, methods using neural networks usually take a lot of training time [13], or use imbalanced datasets [17, 20], which makes validation metrics misleading in reality. In this study, we apply a static malware detection method by Portable Executable analysis and Gradient Boosting Decision Tree algorithm. We manage to reduce the training time by appropriately reducing the feature dimension. The experiment results show that our proposed method can achieve up to 99.394% detection rate at 1% false alarm rate, and score results in less than 0.1% false alarm rate at a detection rate 97.572%, based on more than 600,000 training and 200,000 testing samples from Endgame Malware BEnchmark for Research (EMBER) dataset [1].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anderson, H.S., Roth, P.: Ember: an open dataset for training static PE malware machine learning models. arXiv preprint arXiv:1804.04637 (2018)
Burges, C.J.: From ranknet to lambdarank to lambdamart: an overview. Technical report, June 2010
Chen, Q., Bridges, R.A.: Automated behavioral analysis of malware a case study of WannaCry ransomware. CoRR (2017)
Chinchor, N.: MUC-4 evaluation metrics. In: Proceedings of the Fourth Message Understanding Conference (MUC-4), p. 22. Morgan Kaufman Publishers (1992)
Cohen, F.: Computer viruses: theory and experiments. Comput. Secur. 6(1), 22–35 (1987)
Crowe, J.: Security false positives cost companies $1.37 million a year on average (2017)
Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. (CSUR) 44(2), 6 (2012)
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)
Kephart, J.O., et al.: Biologically inspired defenses against computer viruses. In: IJCAI (1), pp. 985–996 (1995)
Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)
Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Twenty-Third Annual Computer Security Applications Conference. ACSAC 2007, pp. 421–430. IEEE (2007)
Nguyen, V.T., Nguyen, T.T., Mai, K.T., Le, T.D.: A combination of negative selection algorithm and artificial immune network for virus detection. In: Dang, T.K., Wagner, R., Neuhold, E., Takizawa, M., Küng, J., Thoai, N. (eds.) FDSE 2014. LNCS, vol. 8860, pp. 97–106. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12778-1_8
Raff, E., et al.: Malware detection by eating a whole EXE (2017). arXiv preprint arXiv:1710.09435
Raff, E., Sylvester, J., Nicholas, C.: Learning the PE header, malware detection with minimal domain knowledge. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 121–132. ACM (2017)
Richardson, M., Dominowska, E., Ragno, R.: Predicting clicks: estimating the click through rate for new ads. In: Proceedings of the 16th International Conference on World Wide Web, pp. 521–530. ACM (2007)
Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., Ahmadi, M.: Microsoft malware classification challenge (2018). arXiv preprint arXiv:1802.10135
Saxe, J., Berlin, K.: Deep neural network-based malware detection using two-dimensional binary program features. In: 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), pp. 11–20 (2015)
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy, pp. 38–49 (2001)
Van Nhuong, N., Nhi, V.T.Y., Cam, N.T., Phu, M.X., Tan, C.D.: Semantic set analysis for malware detection. In: Saeed, K., Snášel, V. (eds.) CISIM 2014. LNCS, vol. 8838, pp. 688–700. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45237-0_62
Vu, T.N., Nguyen, T.T., Phan Trung, H., Do Duy, T., Van, K.H., Le, T.D.: Metamorphic malware detection by PE analysis with the longest common sequence. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E.J. (eds.) FDSE 2017. LNCS, vol. 10646, pp. 262–272. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70004-5_18
Acknowledgment
This research is funded by Vietnam National University, Ho Chi Minh City (VNUHCM) under grant number C2018–26–06.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Pham, HD., Le, T.D., Vu, T.N. (2018). Static PE Malware Detection Using Gradient Boosting Decision Trees Algorithm. In: Dang, T., Küng, J., Wagner, R., Thoai, N., Takizawa, M. (eds) Future Data and Security Engineering. FDSE 2018. Lecture Notes in Computer Science(), vol 11251. Springer, Cham. https://doi.org/10.1007/978-3-030-03192-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-03192-3_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03191-6
Online ISBN: 978-3-030-03192-3
eBook Packages: Computer ScienceComputer Science (R0)