Skip to main content

Static PE Malware Detection Using Gradient Boosting Decision Trees Algorithm

  • Conference paper
  • First Online:
Future Data and Security Engineering (FDSE 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11251))

Included in the following conference series:

Abstract

Static malware detection is an essential layer in a security suite, which attempts to classify samples as malicious or benign before execution. However, most of the related works incur the scalability issues, for examples, methods using neural networks usually take a lot of training time [13], or use imbalanced datasets [17, 20], which makes validation metrics misleading in reality. In this study, we apply a static malware detection method by Portable Executable analysis and Gradient Boosting Decision Tree algorithm. We manage to reduce the training time by appropriately reducing the feature dimension. The experiment results show that our proposed method can achieve up to 99.394% detection rate at 1% false alarm rate, and score results in less than 0.1% false alarm rate at a detection rate 97.572%, based on more than 600,000 training and 200,000 testing samples from Endgame Malware BEnchmark for Research (EMBER) dataset [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anderson, H.S., Roth, P.: Ember: an open dataset for training static PE malware machine learning models. arXiv preprint arXiv:1804.04637 (2018)

  2. Burges, C.J.: From ranknet to lambdarank to lambdamart: an overview. Technical report, June 2010

    Google Scholar 

  3. Chen, Q., Bridges, R.A.: Automated behavioral analysis of malware a case study of WannaCry ransomware. CoRR (2017)

    Google Scholar 

  4. Chinchor, N.: MUC-4 evaluation metrics. In: Proceedings of the Fourth Message Understanding Conference (MUC-4), p. 22. Morgan Kaufman Publishers (1992)

    Google Scholar 

  5. Cohen, F.: Computer viruses: theory and experiments. Comput. Secur. 6(1), 22–35 (1987)

    Article  Google Scholar 

  6. Crowe, J.: Security false positives cost companies $1.37 million a year on average (2017)

    Google Scholar 

  7. Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. (CSUR) 44(2), 6 (2012)

    Article  Google Scholar 

  8. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)

    Google Scholar 

  9. Kephart, J.O., et al.: Biologically inspired defenses against computer viruses. In: IJCAI (1), pp. 985–996 (1995)

    Google Scholar 

  10. Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)

    MathSciNet  MATH  Google Scholar 

  11. Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Twenty-Third Annual Computer Security Applications Conference. ACSAC 2007, pp. 421–430. IEEE (2007)

    Google Scholar 

  12. Nguyen, V.T., Nguyen, T.T., Mai, K.T., Le, T.D.: A combination of negative selection algorithm and artificial immune network for virus detection. In: Dang, T.K., Wagner, R., Neuhold, E., Takizawa, M., Küng, J., Thoai, N. (eds.) FDSE 2014. LNCS, vol. 8860, pp. 97–106. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12778-1_8

    Chapter  Google Scholar 

  13. Raff, E., et al.: Malware detection by eating a whole EXE (2017). arXiv preprint arXiv:1710.09435

  14. Raff, E., Sylvester, J., Nicholas, C.: Learning the PE header, malware detection with minimal domain knowledge. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 121–132. ACM (2017)

    Google Scholar 

  15. Richardson, M., Dominowska, E., Ragno, R.: Predicting clicks: estimating the click through rate for new ads. In: Proceedings of the 16th International Conference on World Wide Web, pp. 521–530. ACM (2007)

    Google Scholar 

  16. Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., Ahmadi, M.: Microsoft malware classification challenge (2018). arXiv preprint arXiv:1802.10135

  17. Saxe, J., Berlin, K.: Deep neural network-based malware detection using two-dimensional binary program features. In: 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), pp. 11–20 (2015)

    Google Scholar 

  18. Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy, pp. 38–49 (2001)

    Google Scholar 

  19. Van Nhuong, N., Nhi, V.T.Y., Cam, N.T., Phu, M.X., Tan, C.D.: Semantic set analysis for malware detection. In: Saeed, K., Snášel, V. (eds.) CISIM 2014. LNCS, vol. 8838, pp. 688–700. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45237-0_62

    Chapter  Google Scholar 

  20. Vu, T.N., Nguyen, T.T., Phan Trung, H., Do Duy, T., Van, K.H., Le, T.D.: Metamorphic malware detection by PE analysis with the longest common sequence. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E.J. (eds.) FDSE 2017. LNCS, vol. 10646, pp. 262–272. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70004-5_18

    Chapter  Google Scholar 

Download references

Acknowledgment

This research is funded by Vietnam National University, Ho Chi Minh City (VNUHCM) under grant number C2018–26–06.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thanh Nguyen Vu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pham, HD., Le, T.D., Vu, T.N. (2018). Static PE Malware Detection Using Gradient Boosting Decision Trees Algorithm. In: Dang, T., Küng, J., Wagner, R., Thoai, N., Takizawa, M. (eds) Future Data and Security Engineering. FDSE 2018. Lecture Notes in Computer Science(), vol 11251. Springer, Cham. https://doi.org/10.1007/978-3-030-03192-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03192-3_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03191-6

  • Online ISBN: 978-3-030-03192-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics