Static PE Malware Detection Using Gradient Boosting Decision Trees Algorithm

Pham, Huu-Danh; Le, Tuan Dinh; Vu, Thanh Nguyen

doi:10.1007/978-3-030-03192-3_17

Huu-Danh Pham¹⁸,
Tuan Dinh Le¹⁹ &
Thanh Nguyen Vu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11251))

Included in the following conference series:

International Conference on Future Data and Security Engineering

1370 Accesses
19 Citations

Abstract

Static malware detection is an essential layer in a security suite, which attempts to classify samples as malicious or benign before execution. However, most of the related works incur the scalability issues, for examples, methods using neural networks usually take a lot of training time [13], or use imbalanced datasets [17, 20], which makes validation metrics misleading in reality. In this study, we apply a static malware detection method by Portable Executable analysis and Gradient Boosting Decision Tree algorithm. We manage to reduce the training time by appropriately reducing the feature dimension. The experiment results show that our proposed method can achieve up to 99.394% detection rate at 1% false alarm rate, and score results in less than 0.1% false alarm rate at a detection rate 97.572%, based on more than 600,000 training and 200,000 testing samples from Endgame Malware BEnchmark for Research (EMBER) dataset [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anderson, H.S., Roth, P.: Ember: an open dataset for training static PE malware machine learning models. arXiv preprint arXiv:1804.04637 (2018)
Burges, C.J.: From ranknet to lambdarank to lambdamart: an overview. Technical report, June 2010
Google Scholar
Chen, Q., Bridges, R.A.: Automated behavioral analysis of malware a case study of WannaCry ransomware. CoRR (2017)
Google Scholar
Chinchor, N.: MUC-4 evaluation metrics. In: Proceedings of the Fourth Message Understanding Conference (MUC-4), p. 22. Morgan Kaufman Publishers (1992)
Google Scholar
Cohen, F.: Computer viruses: theory and experiments. Comput. Secur. 6(1), 22–35 (1987)
Article Google Scholar
Crowe, J.: Security false positives cost companies $1.37 million a year on average (2017)
Google Scholar
Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. (CSUR) 44(2), 6 (2012)
Article Google Scholar
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)
Google Scholar
Kephart, J.O., et al.: Biologically inspired defenses against computer viruses. In: IJCAI (1), pp. 985–996 (1995)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)
MathSciNet MATH Google Scholar
Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Twenty-Third Annual Computer Security Applications Conference. ACSAC 2007, pp. 421–430. IEEE (2007)
Google Scholar
Nguyen, V.T., Nguyen, T.T., Mai, K.T., Le, T.D.: A combination of negative selection algorithm and artificial immune network for virus detection. In: Dang, T.K., Wagner, R., Neuhold, E., Takizawa, M., Küng, J., Thoai, N. (eds.) FDSE 2014. LNCS, vol. 8860, pp. 97–106. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12778-1_8
Chapter Google Scholar
Raff, E., et al.: Malware detection by eating a whole EXE (2017). arXiv preprint arXiv:1710.09435
Raff, E., Sylvester, J., Nicholas, C.: Learning the PE header, malware detection with minimal domain knowledge. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 121–132. ACM (2017)
Google Scholar
Richardson, M., Dominowska, E., Ragno, R.: Predicting clicks: estimating the click through rate for new ads. In: Proceedings of the 16th International Conference on World Wide Web, pp. 521–530. ACM (2007)
Google Scholar
Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., Ahmadi, M.: Microsoft malware classification challenge (2018). arXiv preprint arXiv:1802.10135
Saxe, J., Berlin, K.: Deep neural network-based malware detection using two-dimensional binary program features. In: 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), pp. 11–20 (2015)
Google Scholar
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy, pp. 38–49 (2001)
Google Scholar
Van Nhuong, N., Nhi, V.T.Y., Cam, N.T., Phu, M.X., Tan, C.D.: Semantic set analysis for malware detection. In: Saeed, K., Snášel, V. (eds.) CISIM 2014. LNCS, vol. 8838, pp. 688–700. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45237-0_62
Chapter Google Scholar
Vu, T.N., Nguyen, T.T., Phan Trung, H., Do Duy, T., Van, K.H., Le, T.D.: Metamorphic malware detection by PE analysis with the longest common sequence. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E.J. (eds.) FDSE 2017. LNCS, vol. 10646, pp. 262–272. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70004-5_18
Chapter Google Scholar

Download references

Acknowledgment

This research is funded by Vietnam National University, Ho Chi Minh City (VNUHCM) under grant number C2018–26–06.

Author information

Authors and Affiliations

University of Information Technology, Vietnam National University Ho Chi Minh City, Ho Chi Minh City, Vietnam
Huu-Danh Pham & Thanh Nguyen Vu
Long An University of Economics and Industry, Tan An, Long An Province, Vietnam
Tuan Dinh Le

Authors

Huu-Danh Pham
View author publications
You can also search for this author in PubMed Google Scholar
Tuan Dinh Le
View author publications
You can also search for this author in PubMed Google Scholar
Thanh Nguyen Vu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanh Nguyen Vu .

Editor information

Editors and Affiliations

Ho Chi Minh City University of Technology, Ho Chi Minh, Vietnam
Tran Khanh Dang
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
Johannes Kepler University of Linz, Linz, Austria
Roland Wagner
Ho Chi Minh City University of Technology, Ho Chi Minh, Vietnam
Nam Thoai
Hosei University, Tokyo, Japan
Makoto Takizawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pham, HD., Le, T.D., Vu, T.N. (2018). Static PE Malware Detection Using Gradient Boosting Decision Trees Algorithm. In: Dang, T., Küng, J., Wagner, R., Thoai, N., Takizawa, M. (eds) Future Data and Security Engineering. FDSE 2018. Lecture Notes in Computer Science(), vol 11251. Springer, Cham. https://doi.org/10.1007/978-3-030-03192-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-03192-3_17
Published: 27 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03191-6
Online ISBN: 978-3-030-03192-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics