Skip to main content

Detection of Malicious PowerShell Using Word-Level Language Models

  • Conference paper
  • First Online:
Advances in Information and Computer Security (IWSEC 2020)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12231))

Included in the following conference series:

Abstract

There is a growing tendency for cybercriminals to abuse legitimate tools installed on the target computers for cyberattacks. In particular, the use of PowerShell provided by Microsoft has been increasing every year and has become a threat. In previous studies, a method to detect malicious PowerShell commands using character-level deep learning was proposed. The proposed method combines traditional natural language processing and character-level convolutional neural networks. This method, however, requires time for dynamic analysis. This paper proposes a method to classify unknown PowerShell without dynamic analysis. Our method uses feature vectors extracted from malicious and benign PowerShell scripts using word-level language models for classification. The datasets were generated from benign and malicious PowerShell scripts obtained from Hybrid Analysis, and benign PowerShell scripts obtained from GitHub, which are imbalanced. The experimental result shows that the combination of the LSI and XGBoost produces the highest detection rate. The maximum accuracy achieves approximately 0.95 on the imbalanced dataset. Furthermore, over 50% of unknown malicious PowerShell scripts could be detected in time series analysis without dynamic analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. GitHub. https://github.co.jp/

  2. Hybrid Analysis. https://www.hybrid-analysis.com/

  3. Powerdrive. https://github.com/denisugarte/PowerDrive

  4. Virus Total. https://www.virustotal.com/

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  6. Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). https://doi.org/10.1145/1961189.1961199

    Article  Google Scholar 

  7. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17, 2016, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785

  8. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990). https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9

    Article  Google Scholar 

  9. Hendler, D., Kels, S., Rubin, A.: Detecting malicious PowerShell commands using deep neural networks. In: Proceedings of the 2018 on Asia Conference on Computer and Communications Security, AsiaCCS 2018, Incheon, Republic of Korea, June 04–08, 2018, pp. 187–197 (2018). https://doi.org/10.1145/3196494.3196511

  10. Ito, R., Mimura, M.: Detecting unknown malware from ASCII strings with natural language processing techniques. In: 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), pp. 1–8 (2019)

    Google Scholar 

  11. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014, pp. 1188–1196 (2014). http://proceedings.mlr.press/v32/le14.html

  12. McAfee: McAfee labs threats report August 2019 (August 2019). https://www.mcafee.com/enterprise/en-us/assets/reports/rp-quarterly-threats-aug-2019.pdf

  13. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, 2013, Workshop Track Proceedings (2013). http://arxiv.org/abs/1301.3781

  14. Mimura, M., Suga, Y.: Filtering malicious JavaScript code with doc2vec on an imbalanced dataset. In: 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), pp. 24–31 (2019)

    Google Scholar 

  15. Mimura, M., Miura, H.: Detecting unseen malicious VBA macros with NLP techniques. JIP 27, 555–563 (2019). https://doi.org/10.2197/ipsjjip.27.555

    Article  Google Scholar 

  16. Mimura, M., Ohminami, T.: Towards efficient detection of malicious VBA macros with LSI. In: Attrapadung, N., Yagi, T. (eds.) IWSEC 2019. LNCS, vol. 11689, pp. 168–185. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26834-3_10

    Chapter  Google Scholar 

  17. Mimura, M., Ohminami, T.: Using LSI to detect unknown malicious VBA macros. J. Inf. Process. 28 (2020)

    Google Scholar 

  18. Miura, H., Mimura, M., Tanaka, H.: Macros finder: do you remember LOVELETTER? In: Su, C., Kikuchi, H. (eds.) ISPEC 2018. LNCS, vol. 11125, pp. 3–18. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99807-7_1

    Chapter  Google Scholar 

  19. Ndichu, S., Kim, S., Ozawa, S., Misu, T., Makishima, K.: A machine learning approach to detection of JavaScript-based attacks using AST features and paragraph vectors. Appl. Soft Comput. 84, 105721 (2019). https://doi.org/10.1016/j.asoc.2019.105721

    Article  Google Scholar 

  20. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). http://dl.acm.org/citation.cfm?id=2078195

    MathSciNet  MATH  Google Scholar 

  21. Rubin, A., Kels, S., Hendler, D.: AMSI-based detection of malicious PowerShell code using contextual embeddings. arXiv e-prints arXiv:1905.09538 (May 2019)

  22. Rusak, G., Al-Dujaili, A., O’Reilly, U.: AST-based deep learning for detecting malicious PowerShell. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15–19, 2018, pp. 2276–2278 (2018). https://doi.org/10.1145/3243734.3278496

  23. Symantec: Symantec 2019 Internet security threat report (February 2019). https://docs.broadcom.com/docs/istr-24-2019-en

  24. Ugarte, D., Maiorca, D., Cara, F., Giacinto, G.: PowerDrive: accurate de-obfuscation and analysis of PowerShell malware. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds.) DIMVA 2019. LNCS, vol. 11543, pp. 240–259. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22038-9_12

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yui Tajiri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tajiri, Y., Mimura, M. (2020). Detection of Malicious PowerShell Using Word-Level Language Models. In: Aoki, K., Kanaoka, A. (eds) Advances in Information and Computer Security. IWSEC 2020. Lecture Notes in Computer Science(), vol 12231. Springer, Cham. https://doi.org/10.1007/978-3-030-58208-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58208-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58207-4

  • Online ISBN: 978-3-030-58208-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics