Behavior-based features model for malware detection

  • Hisham Shehata GalalEmail author
  • Yousef Bassyouni Mahdy
  • Mohammed Ali Atiea
Original Paper


The sharing of malicious code libraries and techniques over the Internet has vastly increased the release of new malware variants in an unprecedented rate. Malware variants share similar behaviors yet they have different syntactic structure due to the incorporation of many obfuscation and code change techniques such as polymorphism and metamorphism. The different structure of malware variants poses a serious problem to signature-based detection technique, yet their similar exhibited behaviors and actions can be a remarkable feature to detect them by behavior-based techniques. Malware instances also largely depend on API calls provided by the operating system to achieve their malicious tasks. Therefore, behavior-based detection techniques that utilize API calls are promising for the detection of malware variants. In this paper, we propose a behavior-based features model that describes malicious action exhibited by malware instance. To extract the proposed model, we first perform dynamic analysis on a relatively recent malware dataset inside a controlled virtual environment and capture traces of API calls invoked by malware instances. The traces are then generalized into high-level features we refer to as actions. We assessed the viability of actions by various classification algorithms such as decision tree, random forests, and support vector machine. The experimental results demonstrate that the classifiers attain high accuracy and satisfactory results in the detection of malware variants.


Hide Markov Model Virtual Machine Heuristic Function Control Flow Graph Benign Sample 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Fossi, M., Egan, G., Haley, K., Johnson, E., Mack, T., Adams, T., Blackbird, J., Low, M.K., Mazurek, D., McKinney, D., et al.: Symantec internet security threat report trends for 2010, vol. 16 (2011)Google Scholar
  2. 2.
    Gennari, J., French, D.: Defining malware families based on analyst insights. In: Technologies for Homeland Security (HST), 2011 IEEE International Conference on IEEE, pp. 396–401 (2011)Google Scholar
  3. 3.
    Mairh, A., Barik, D., Verma, K., Jena, D.: Honeypot in network security: a survey. In: Proceedings of the 2011 International Conference on Communication, Computing & Security ACM, pp. 600–605 (2011)Google Scholar
  4. 4.
    Kiemt, H., Thuy, N.T., Quang, T.M.N.: A machine learning approach to anti-virus system (artificial intelligence i). IPSJ SIG Notes. ICS 2004(125), 61–65 (2004)Google Scholar
  5. 5.
    Eskandari, M., Khorshidpour, Z., Hashemi, S.: Hdm-analyser: a hybrid analysis approach based on data mining techniques for malware detection. J. Comput. Virol. Hacking Tech. 9(2), 77–93 (2013)CrossRefGoogle Scholar
  6. 6.
    Kaspersky. Heuristic analysis in anti-virus. (2013). Accessed in 1 April 2015
  7. 7.
    Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Twenty-third annual IEEE Computer security applications conference, 2007. ACSAC 2007, pp. 421–430 (2007)Google Scholar
  8. 8.
    Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)CrossRefGoogle Scholar
  9. 9.
    Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. (CSUR) 44(2), 6 (2012)CrossRefGoogle Scholar
  10. 10.
    Sikorski, M., Honig, A.: Practical malware analysis: the hands-on guide to dissecting malicious software. No Starch Press (2012)Google Scholar
  11. 11.
    Cesare, S., Xiang, Y., Zhou, Wanlei: Malwise&# x2014; an effective and efficient classification system for packed and polymorphic malware. IEEE Trans. Comput. 62(6), 1193–1206 (2013)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Lindorfer, M., Kolbitsch, C., Comparetti, P.M.: Detecting environment-sensitive malware. In: Recent Advances in Intrusion Detection, pp. 338–357. Springer (2011)Google Scholar
  13. 13.
    Nektra Advanced Computing. Deviare api hook. (2015). Accessed in 1 April 2015
  14. 14.
    Canfora, G.: Antonio Niccolò Iannaccone, and Corrado Aaron Visaggio. Static analysis for the detection of metamorphic computer viruses using repeated-instructions counting heuristics. J. Comput. Virol. Hacking Tech. 10(1), 11–27 (2014)CrossRefGoogle Scholar
  15. 15.
    Kalbhor, A., Austin, T.H., Filiol, E., Josse, S., Mark, S.: Dueling hidden markov models for virus analysis. J. Comput. Virol. Hacking Tech. 11, 1–16 (2014)Google Scholar
  16. 16.
    Lin, D., Stamp, M.: Hunting for undetectable metamorphic viruses. J. Comput. Virol. 7(3), 201–214 (2011)CrossRefGoogle Scholar
  17. 17.
    Musale, M., Austin, T.H., Stamp, M.: Hunting for metamorphic javascript malware. J. Comput. Virol. Hacking Tech. 1–14 (2014)Google Scholar
  18. 18.
    Shanmugam, G., Low, R.M., Stamp, M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hacking Tech. 9(3), 159–170 (2013)CrossRefGoogle Scholar
  19. 19.
    Annachhatre, C., Austin, T.H., Stamp, M.: Hidden markov models for malware classification. J. Comput. Virol. Hacking Tech. 1–15 (2014)Google Scholar
  20. 20.
    Faruki, P., Laxmi, V., Gaur, M.S., Vinod, P.: Mining control flow graph as api call-grams to detect portable executable malware. In Proceedings of the Fifth International Conference on Security of Information and Networks ACM, pp. 130–137 (2012)Google Scholar
  21. 21.
    Park, Y., Reeves, D.S., Stamp, M.: Deriving common malware behavior through graph clustering. Comput. Secur. 39, 419–430 (2013)CrossRefGoogle Scholar
  22. 22.
    Eskandari, M., Hashemi, Sattar: A graph mining approach for detecting unknown malwares. J. Vis. Lang. Comput. 23(3), 154–162 (2012)CrossRefGoogle Scholar
  23. 23.
    Islam, R., Tian, R., Batten, L.M., Versteeg, S.: Classification of malware based on integrated static and dynamic features. J. Netw. Comput. Appl. 36(2), 646–656 (2013)CrossRefGoogle Scholar
  24. 24.
    VirusSign. Malware research and data center. (2015). Accessed in 1 April 2015
  25. 25.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  27. 27.
    Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology (1990)Google Scholar
  28. 28.
    Demšar, J., Curk, T., Erjavec, A., Gorup, Č., Hočevar, T., Milutinovič, M., Možina, M., Polajnar, M., Toplak, M., Starič, A., Štajdohar, M., Umek, L., Žagar, L., Žbontar, J., Žitnik, M., Zupan, B.: Orange: Data mining toolbox in python. J. Mach. Learn. Res. 14, 2349–2353 (2013)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag France 2015

Authors and Affiliations

  • Hisham Shehata Galal
    • 1
    Email author
  • Yousef Bassyouni Mahdy
    • 1
  • Mohammed Ali Atiea
    • 1
  1. 1.Faculty of Computers and InformationAssiut UniversityAssiutEgypt

Personalised recommendations