Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and number of malware species made it very difficult for forensics investigators to provide an on time response. Therefore, Machine Learning (ML) aided malware analysis became a necessity to automate different aspects of static and dynamic malware investigation. We believe that machine learning aided static analysis can be used as a methodological approach in technical Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware analysis that has been thoroughly studied before. In this paper, we address this research gap by conducting an in-depth survey of different machine learning methods for classification of static characteristics of 32-bit malicious Portable Executable (PE32) Windows files and develop taxonomy for better understanding of these techniques. Afterwards, we offer a tutorial on how different machine learning techniques can be utilized in extraction and analysis of a variety of static characteristic of PE binaries and evaluate accuracy and practical generalization of these techniques. Finally, the results of experimental study of all the method using common data was given to demonstrate the accuracy and complexity. This paper may serve as a stepping stone for future researchers in cross-disciplinary field of machine learning aided malware forensics.
- Machine learning
- Static analysis
- Artificial intelligence
This is a preview of subscription content, access via your institution.
Virusshare.com. http://virusshare.com/. accessed: 15.10.2015.
Vx heaven. http://vxheaven.org/. accessed: 25.10.2015.
Weka 3: Data mining software in java. http://www.cs.waikato.ac.nz/ml/weka/. accessed: 10.09.2015.
Gianni Amato. Peframe. https://github.com/guelfoweb/peframe. accessed: 20.10.2015.
M. Baig, P. Zavarsky, R. Ruhl, and D. Lindskog. The study of evasion of packed pe from static detection. In Internet Security (WorldCIS), 2012 World Congress on, pages 99–104, June 2012.
Simen Rune Bragen. Malware detection through opcode sequence analysis using machine learning. Master’s thesis, Gjøvik University College, 2015.
C. Cepeda, D. L. C. Tien, and P. Ordóñez. Feature selection and improving classification performance for malware detection. In 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), pages 560–566, Oct 2016.
Mohsen Damshenas, Ali Dehghantanha, and Ramlan Mahmoud. A survey on malware propagation, analysis, and detection. International Journal of Cyber-Security and Digital Forensics (IJCSDF), 2(4):10–29, 2013.
F. Daryabar, A. Dehghantanha, and N. I. Udzir. Investigation of bypassing malware defences and malware detections. In 2011 7th International Conference on Information Assurance and Security (IAS), pages 173–178, Dec 2011.
Farid Daryabar, Ali Dehghantanha, and Hoorang Ghasem Broujerdi. Investigation of malware defence and detection techniques. International Journal of Digital Information and Wireless Communications (IJDIWC), 1(3):645–650, 2011.
Farid Daryabar, Ali Dehghantanha, Brett Eterovic-Soric, and Kim-Kwang Raymond Choo. Forensic investigation of onedrive, box, googledrive and dropbox applications on android and ios devices. Australian Journal of Forensic Sciences, 48(6):615–642, 2016.
Farid Daryabar, Ali Dehghantanha, Nur Izura Udzir, Solahuddin bin Shamsuddin, et al. Towards secure model for scada systems. In Cyber Security, Cyber Warfare and Digital Forensic (CyberSec), 2012 International Conference on, pages 60–64. IEEE, 2012.
Farid Daryabar, Ali Dehghantanha, Nur Izura Udzir, et al. A review on impacts of cloud computing on digital forensics. International Journal of Cyber-Security and Digital Forensics (IJCSDF), 2(2):77–94, 2013.
Ali Dehghantanha and Katrin Franke. Privacy-respecting digital investigation. In Privacy, Security and Trust (PST), 2014 Twelfth Annual International Conference on, pages 129–138. IEEE, 2014.
Dhruwajita Devi and Sukumar Nandi. Detection of packed malware. In Proceedings of the First International Conference on Security of Internet of Things, SecurIT ’12, pages 22–26, New York, NY, USA, 2012. ACM.
Dennis Distler and Charles Hornat. Malware analysis: An introduction. Sans Reading Room, 2007.
T. Dube, R. Raines, G. Peterson, K. Bauer, M. Grimaila, and S. Rogers. Malware type recognition and cyber situational awareness. In Social Computing (SocialCom), 2010 IEEE Second International Conference on, pages 938–943, Aug 2010.
Tim Ebringer, Li Sun, and Serdar Boztas. A fast randomness test that preserves local detail. Virus Bulletin, 2008, 2008.
Parvez Faruki, Vijay Laxmi, M. S. Gaur, and P. Vinod. Mining control flow graph as api call-grams to detect portable executable malware. In Proceedings of the Fifth International Conference on Security of Information and Networks, SIN ’12, pages 130–137, New York, NY, USA, 2012. ACM.
Anders Flaglien, Katrin Franke, and Andre Arnes. Identifying malware using cross-evidence correlation. In IFIP International Conference on Digital Forensics, pages 169–182. Springer Berlin Heidelberg, 2011.
Tristan Fletcher. Support vector machines explained. [Online]. http://sutikno.blog.undip.ac.id/files/2011/11/SVM-Explained.pdf.[Accessed 06 06 2013], 2009.
Katrin Franke, Erik Hjelmås, and Stephen D Wolthusen. Advancing digital forensics. In IFIP World Conference on Information Security Education, pages 288–295. Springer Berlin Heidelberg, 2009.
Katrin Franke and Sargur N Srihari. Computational forensics: Towards hybrid-intelligent crime investigation. In Information Assurance and Security, 2007. IAS 2007. Third International Symposium on, pages 383–386. IEEE, 2007.
Mark A Hall and Lloyd A Smith. Practical feature subset selection for machine learning. Proceedings of the 21st Australasian Computer Science Conference ACSC’98, 1998.
Chris Hoffman. How to keep your pc secure when microsoft ends windows xp support. http://www.pcworld.com/article/2102606/how-to-keep-your-pc-secure-when-microsoft-ends-windows-xp-support.html. accessed: 18.04.2016.
Anil K Jain, Robert PW Duin, and Jianchang Mao. Statistical pattern recognition: A review. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(1):4–37, 2000.
Sachin Jain and Yogesh Kumar Meena. Byte level n–gram analysis for malware detection. In Computer Networks and Intelligent Computing, pages 51–59. Springer, 2011.
Kris Kendall and Chad McMillan. Practical malware analysis. In Black Hat Conference, USA, 2007.
Z. Khorsand and A. Hamzeh. A novel compression-based approach for malware detection using pe header. In Information and Knowledge Technology (IKT), 2013 5th Conference on, pages 127–133, May 2013.
Teuvo Kohonen and Timo Honkela. Kohonen network. Scholarpedia, 2(1):1568, 2007.
Jeremy Z. Kolter and Marcus A. Maloof. Learning to detect malicious executables in the wild. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pages 470–478, New York, NY, USA, 2004. ACM.
Igor Kononenko and Matjaž Kukar. Machine learning and data mining: introduction to principles and algorithms. Horwood Publishing, 2007.
S. Kumar, M. Azad, O. Gomez, and R. Valdez. Can microsoft’s service pack2 (sp2) security software prevent smurf attacks? In Advanced Int’l Conference on Telecommunications and Int’l Conference on Internet and Web Applications and Services (AICT-ICIW’06), pages 89–89, Feb 2006.
Lastline. The threat of evasive malware. white paper, Lastline Labs, https://www.lastline.com/papers/evasive_threats.pdf, February 2013. accessed: 29.10.2015.
N. A. Le-Khac and A. Linke. Control flow change in assembly as a classifier in malware analysis. In 2016 4th International Symposium on Digital Forensic and Security (ISDFS), pages 38–43, April 2016.
Woody Leonhard. Atms will still run windows xp – but a bigger shift in security looms. http://www.infoworld.com/article/2610392/microsoft-windows/atms-will-still-run-windows-xp----but-a-bigger-shift-in-security-looms.html, March 2014. accessed: 09.11.2015.
R. J. Mangialardo and J. C. Duarte. Integrating static and dynamic malware analysis using machine learning. IEEE Latin America Transactions, 13(9):3080–3087, Sept 2015.
Z. Markel and M. Bilzor. Building a machine learning classifier for malware detection. In Anti-malware Testing Research (WATeR), 2014 Second Workshop on, pages 1–4, Oct 2014.
M.M. Masud, L. Khan, and B. Thuraisingham. A hybrid model to detect malicious executables. In Communications, 2007. ICC ’07. IEEE International Conference on, pages 1443–1448, June 2007.
Microsoft. Microsoft security essentials. http://windows.microsoft.com/en-us/windows/security-essentials-download. accessed: 18.04.2016.
Microsoft. Set application-specific access permissions. https://technet.microsoft.com/en-us/library/cc731858%28v=ws.11%29.aspx. accessed: 30.05.2016.
C. Miles, A. Lakhotia, C. LeDoux, A. Newsom, and V. Notani. Virusbattle: State-of-the-art malware analysis for better cyber threat intelligence. In 2014 7th International Symposium on Resilient Control Systems (ISRCS), pages 1–6, Aug 2014.
Nikola Milosevic, Ali Dehghantanha, and Kim-Kwang Raymond Choo. Machine learning aided android malware classification. Computers & Electrical Engineering, 2017.
S. Naval, V. Laxmi, M. Rajarajan, M. S. Gaur, and M. Conti. Employing program semantics for malware detection. IEEE Transactions on Information Forensics and Security, 10(12):2591–2604, Dec 2015.
Farhood Norouzizadeh Dezfouli, Ali Dehghantanha, Brett Eterovic-Soric, and Kim-Kwang Raymond Choo. Investigating social networking applications on smartphones detecting facebook, twitter, linkedin and google+ artefacts on android and ios platforms. Australian journal of forensic sciences, 48(4):469–488, 2016.
Opeyemi Osanaiye, Haibin Cai, Kim-Kwang Raymond Choo, Ali Dehghantanha, Zheng Xu, and Mqhele Dlodlo. Ensemble-based multi-filter feature selection method for ddos detection in cloud computing. EURASIP Journal on Wireless Communications and Networking, 2016(1):130, 2016.
Hamed Haddad Pajouh, Reza Javidan, Raouf Khayami, Dehghantanha Ali, and Kim-Kwang Raymond Choo. A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in iot backbone networks. IEEE Transactions on Emerging Topics in Computing, 2016.
Shuhui Qi, Ming Xu, and Ning Zheng. A malware variant detection method based on byte randomness test. Journal of Computers, 8(10):2469–2477, 2013.
J. Ross Quinlan. Improved use of continuous attributes in c4. 5. Journal of artificial intelligence research, pages 77–90, 1996.
RC Quinlan. 4.5: Programs for machine learning morgan kaufmann publishers inc. San Francisco, USA, 1993.
D Krishna Sandeep Reddy and Arun K Pujari. N-gram analysis for computer virus detection. Journal in Computer Virology, 2(3):231–239, 2006.
Seth Rosenblatt. Malwarebytes: With anti-exploit, we’ll stop the worst attacks on pcs. http://www.cnet.com/news/malwarebytes-finally-unveils-freeware-exploit-killer/. accessed: 30.05.2016.
Neil J. Rubenking. The best antivirus utilities for 2016. http://uk.pcmag.com/antivirus-reviews/8141/guide/the-best-antivirus-utilities-for-2016. accessed: 30.05.2016.
Paul Rubens. 10 ways to keep windows xp machines secure. http://www.cio.com/article/2376575/windows-xp/10-ways-to-keep-windows-xp-machines-secure.html. accessed: 18.04.2016.
Ashkan Sami, Babak Yadegari, Hossein Rahimi, Naser Peiravian, Sattar Hashemi, and Ali Hamze. Malware detection based on mining api calls. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC ’10, pages 1020–1025, New York, NY, USA, 2010. ACM.
S. Samtani, K. Chinn, C. Larson, and H. Chen. Azsecure hacker assets portal: Cyber threat intelligence and malware analysis. In 2016 IEEE Conference on Intelligence and Security Informatics (ISI), pages 19–24, Sept 2016.
SANS. Who’s using cyberthreat intelligence and how? https://www.sans.org/reading-room/whitepapers/analyst/cyberthreat-intelligence-how-35767. accessed: 01.03.2017.
Igor Santos, Felix Brezo, Xabier Ugarte-Pedrero, and Pablo G Bringas. Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Sciences, 231:64–82, 2013.
Igor Santos, Xabier Ugarte-Pedrero, Borja Sanz, Carlos Laorden, and Pablo G. Bringas. Collective classification for packed executable identification. In Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS ’11, pages 23–30, New York, NY, USA, 2011. ACM.
Asaf Shabtai, Yuval Fledel, and Yuval Elovici. Automated static code analysis for classifying android applications using machine learning. In Computational Intelligence and Security (CIS), 2010 International Conference on, pages 329–333. IEEE, 2010.
Kaveh Shaerpour, Ali Dehghantanha, and Ramlan Mahmod. Trends in android malware detection. The Journal of Digital Forensics, Security and Law: JDFSL, 8(3):21, 2013.
R.K. Shahzad, N. Lavesson, and H. Johnson. Accurate adware detection using opcode sequence extraction. In Availability, Reliability and Security (ARES), 2011 Sixth International Conference on, pages 189–195, Aug 2011.
Andrii Shalaginov and Katrin Franke. Automated generation of fuzzy rules from large-scale network traffic analysis in digital forensics investigations. In 7th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2015). IEEE, 2015.
Andrii Shalaginov and Katrin Franke. A new method for an optimal som size determination in neuro-fuzzy for the digital forensics applications. In Advances in Computational Intelligence, pages 549–563. Springer International Publishing, 2015.
Andrii Shalaginov and Katrin Franke. A new method of fuzzy patches construction in neuro-fuzzy for malware detection. In IFSA-EUSFLAT. Atlantis Press, 2015.
Andrii Shalaginov and Katrin Franke. Automated intelligent multinomial classification of malware species using dynamic behavioural analysis. In IEEE Privacy, Security and Trust 2016, 2016.
Andrii Shalaginov and Katrin Franke. Big data analytics by automated generation of fuzzy rules for network forensics readiness. Applied Soft Computing, 2016.
Andrii Shalaginov and Katrin Franke. Towards Improvement of Multinomial Classification Accuracy of Neuro-Fuzzy for Digital Forensics Applications, pages 199–210. Springer International Publishing, Cham, 2016.
Andrii Shalaginov, Katrin Franke, and Xiongwei Huang. Malware beaconing detection by mining large-scale dns logs for targeted attack identification. In 18th International Conference on Computational Intelligence in Security Information Systems. WASET, 2016.
Andrii Shalaginov, Lars Strande Grini, and Katrin Franke. Understanding neuro-fuzzy on a class of multinomial malware detection problems. In IEEE International Joint Conference on Neural Networks (IJCNN 2016), Jul 2016.
M. Shankarapani, K. Kancherla, S. Ramammoorthy, R. Movva, and S. Mukkamala. Kernel machines for malware classification and similarity analysis. In Neural Networks (IJCNN), The 2010 International Joint Conference on, pages 1–6, July 2010.
Muazzam Ahmed Siddiqui. Data mining methods for malware detection. ProQuest, 2008.
Holly Stewart. Infection rates and end of support for windows xp. https://blogs.technet.microsoft.com/mmpc/2013/10/29/infection-rates-and-end-of-support-for-windows-xp/. accessed: 01.04.2016.
Li Sun, Steven Versteeg, Serdar Boztaş, and Trevor Yann. Pattern recognition techniques for the classification of malware packers. In Information security and privacy, pages 370–390. Springer, 2010.
S Momina Tabish, M Zubair Shafiq, and Muddassar Farooq. Malware detection using statistical analysis of byte-level file content. In Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics, pages 23–31. ACM, 2009.
Shugang Tang. The detection of trojan horse based on the data mining. In Fuzzy Systems and Knowledge Discovery, 2009. FSKD ’09. Sixth International Conference on, volume 1, pages 311–314, Aug 2009.
X. Ugarte-Pedrero, I. Santos, P.G. Bringas, M. Gastesi, and J.M. Esparza. Semi-supervised learning for packed executable detection. In Network and System Security (NSS), 2011 5th International Conference on, pages 342–346, Sept 2011.
R Veeramani and Nitin Rai. Windows api based malware detection and framework analysis. In International conference on networks and cyber security, volume 25, 2012.
C. Wang, Z. Qin, J. Zhang, and H. Yin. A malware variants detection methodology with an opcode based feature method and a fast density based clustering algorithm. In 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pages 481–487, Aug 2016.
Tzu-Yen Wang, Chin-Hsiung Wu, and Chu-Cheng Hsieh. Detecting unknown malicious executables using portable executable headers. In INC, IMS and IDC, 2009. NCM ’09. Fifth International Joint Conference on, pages 278–284, Aug 2009.
Steve Watson and Ali Dehghantanha. Digital forensics: the missing piece of the internet of things promise. Computer Fraud & Security, 2016(6):5–8, 2016.
Yanfang Ye, Dingding Wang, Tao Li, Dongyi Ye, and Qingshan Jiang. An intelligent pe-malware detection system based on association mining. Journal in computer virology, 4(4):323–334, 2008.
M.N.A. Zabidi, M.A. Maarof, and A. Zainal. Malware analysis with multiple features. In Computer Modelling and Simulation (UKSim), 2012 UKSim 14th International Conference on, pages 231–235, March 2012.
Zongqu Zhao. A virus detection scheme based on features of control flow graph. In Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), 2011 2nd International Conference on, pages 943–947, Aug 2011.
Editors and Affiliations
Rights and permissions
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Shalaginov, A., Banin, S., Dehghantanha, A., Franke, K. (2018). Machine Learning Aided Static Malware Analysis: A Survey and Tutorial. In: Dehghantanha, A., Conti, M., Dargahi, T. (eds) Cyber Threat Intelligence. Advances in Information Security, vol 70. Springer, Cham. https://doi.org/10.1007/978-3-319-73951-9_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73950-2
Online ISBN: 978-3-319-73951-9
eBook Packages: Computer ScienceComputer Science (R0)