OPEM: A Static-Dynamic Approach for Machine-Learning-Based Malware Detection

  • Igor Santos
  • Jaime Devesa
  • Félix Brezo
  • Javier Nieves
  • Pablo Garcia Bringas
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 189)


Malware is any computer software potentially harmful to both computers and networks. The amount of malware is growing every year and poses a serious global security threat. Signature-based detection is the most extended method in commercial antivirus software, however, it consistently fails to detect new malware. Supervised machine learning has been adopted to solve this issue. There are two types of features that supervised malware detectors use: (i) static features and (ii) dynamic features. Static features are extracted without executing the sample whereas dynamic ones requires an execution. Both approaches have their advantages and disadvantages. In this paper, we propose for the first time, OPEM, an hybrid unknown malware detector which combines the frequency of occurrence of operational codes (statically obtained) with the information of the execution trace of an executable (dynamically obtained). We show that this hybrid approach enhances the performance of both approaches when run separately.


malware hybrid static dynamic machine learning computer security 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Schultz, M., Eskin, E., Zadok, F., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings of the 22nd IEEE Symposium on Security and Privacy, pp. 38–49 (2001)Google Scholar
  2. 2.
    Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 470–478. ACM, New York (2004)Google Scholar
  3. 3.
    Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Elovici, Y.: Unknown malcode detection via text categorization and the imbalance problem. In: Proceedings of the 6th IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 156–161 (2008)Google Scholar
  4. 4.
    Santos, I., Penya, Y., Devesa, J., Bringas, P.: N-Grams-based file signatures for malware detection. In: Proceedings of the 11th International Conference on Enterprise Information Systems (ICEIS). AIDSS, pp. 317–320 (2009)Google Scholar
  5. 5.
    Christodorescu, M.: Behavior-based malware detection. PhD thesis (2007)Google Scholar
  6. 6.
    Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: Polyunpack: Automating the hidden-code extraction of unpack-executing malware. In: Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC), pp. 289–300 (2006)Google Scholar
  7. 7.
    Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Proceedings of the 23rd Annual Computer Security Applications Conference (ACSAC), pp. 421–430 (2007)Google Scholar
  8. 8.
    Kolbitsch, C., Holz, T., Kruegel, C., Kirda, E.: Inspector Gadget: Automated Extraction of Proprietary Gadgets from Malware Binaries. In: Proceedings of the 30th IEEE Symposium on Security & Privacy (2010)Google Scholar
  9. 9.
    Cavallaro, L., Saxena, P., Sekar, R.: On the Limits of Information Flow Techniques for Malware Analysis and Containment. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 143–163. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Santos, I., Brezo, F., Nieves, J., Penya, Y.K., Sanz, B., Laorden, C., Bringas, P.G.: Idea: Opcode-Sequence-Based Malware Detection. In: Massacci, F., Wallach, D., Zannone, N. (eds.) ESSoS 2010. LNCS, vol. 5965, pp. 35–43. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Devesa, J., Santos, I., Cantero, X., Penya, Y.K., Bringas, P.G.: Automatic Behaviour-based Analysis and Classification System for Malware Detection. In: Proceedings of the 12th International Conference on Enterprise Information Systems, ICEIS (2010)Google Scholar
  12. 12.
    McGill, M., Salton, G.: Introduction to modern information retrieval. McGraw-Hill (1983)Google Scholar
  13. 13.
    Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis using cwsandbox. IEEE Security & Privacy 5(2), 32–39 (2007)CrossRefGoogle Scholar
  14. 14.
    Ferrie, P.: Attacks on virtual machine emulators. In: Proc. of AVAR Conference, pp. 128–143 (2006)Google Scholar
  15. 15.
    Lee, T., Mody, J.: Behavioral classification. In: Proceedings of the 15th European Institute for Computer Antivirus Research (EICAR) Conference (2006)Google Scholar
  16. 16.
    Kent, J.T.: Information gain and a general measure of correlation. Biometrika 70(1), 163 (1983)MathSciNetMATHCrossRefGoogle Scholar
  17. 17.
    Bishop, C.M.: Pattern recognition and machine learning. Springer, New York (2006)MATHGoogle Scholar
  18. 18.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1137–1145 (1995)Google Scholar
  19. 19.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)MATHCrossRefGoogle Scholar
  20. 20.
    Quinlan, J.: C4. 5 programs for machine learning. Morgan Kaufmann Publishers (1993)Google Scholar
  21. 21.
    Cooper, G.F., Herskovits, E.: A bayesian method for constructing bayesian belief networks from databases. In: Proceedings of the 7th Conference on Uncertainty in Artificial Intelligence (1991)Google Scholar
  22. 22.
    Russell, S.J., Norvig: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall (2003)Google Scholar
  23. 23.
    Geiger, D., Goldszmidt, M., Provan, G., Langley, P., Smyth, P.: Bayesian network classifiers. Machine Learning, 131–163 (1997)Google Scholar
  24. 24.
    Lewis, D.D.: Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–18. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  25. 25.
    Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Methods-Support Vector Learning 208 (1999)Google Scholar
  26. 26.
    Amari, S., Wu, S.: Improving support vector machine classifiers by modifying kernel functions. Neural Networks 12(6), 783–789 (1999)CrossRefGoogle Scholar
  27. 27.
    Üstün, B., Melssen, W.J., Buydens, L.M.C.: Facilitating the application of Support Vector Regression by using a universal Pearson VII function based kernel. Chemometrics and Intelligent Laboratory Systems 81(1), 29–40 (2006)CrossRefGoogle Scholar
  28. 28.
    Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. International Journal of Computer Applications in Technology 35(2), 183–193 (2009)CrossRefGoogle Scholar
  29. 29.
    Kang, M., Poosankam, P., Yin, H.: Renovo: A hidden code extractor for packed executables. In: Proceedings of the 2007 ACM Workshop on Recurring Malcode, pp. 46–53 (2007)Google Scholar
  30. 30.
    Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: Fast, generic, and safe unpacking of malware. In: Proceedings of the 23rd Annual Computer Security Applications Conference (ACSAC), pp. 431–441 (2007)Google Scholar
  31. 31.
    Sharif, M., Yegneswaran, V., Saidi, H., Porras, P.A., Lee, W.: Eureka: A Framework for Enabling Static Malware Analysis. In: Jajodia, S., Lopez, J. (eds.) ESORICS 2008. LNCS, vol. 5283, pp. 481–500. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  32. 32.
    Ferrie, P.: Anti-Unpacker Tricks. In: Proc. of the 2nd International CARO Workshop (2008)Google Scholar
  33. 33.
    Moser, A., Kruegel, C., Kirda, E.: Exploring multiple execution paths for malware analysis. In: Proceedings of the 28th IEEE Symposium on Security and Privacy, pp. 231–245 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Igor Santos
    • 1
  • Jaime Devesa
    • 1
  • Félix Brezo
    • 1
  • Javier Nieves
    • 1
  • Pablo Garcia Bringas
    • 1
  1. 1.S3Lab, DeustoTech - ComputingDeusto Institute of Technology University of DeustoBilbaoSpain

Personalised recommendations