Advertisement

International Journal of Information Security

, Volume 15, Issue 4, pp 361–379 | Cite as

Malware detection using bilayer behavior abstraction and improved one-class support vector machines

  • Qiguang Miao
  • Jiachen Liu
  • Ying Cao
  • Jianfeng Song
Regular Contribution

Abstract

Malware detection is one of the most challenging problems in computer security. Recently, methods based on machine learning are very popular in unknown and variant malware detection. In order to achieve a successful learning, extracting discriminant and stable features is the most important prerequisite. In this paper, we propose a bilayer behavior abstraction method based on semantic analysis of dynamic API sequences. Operations on sensitive system resources and complex behaviors are abstracted in an interpretable way at different semantic layers. At the lower layer, raw API calls are combined to abstract low-layer behaviors via data dependency analysis. At the higher layer, low-layer behaviors are further combined to construct more complex high-layer behaviors with good interpretability. The extracted low-layer and high-layer behaviors are finally embedded into a high-dimensional vector space. Hence, the abstracted behaviors can be directly used by many popular machine learning algorithms. Besides, to tackle the problem that benign programs are not adequately sampled or malware and benign programs are severely imbalanced, an improved one-class support vector machine (OC-SVM) named OC-SVM-Neg is proposed which makes use of the available negative samples. Experimental results show that the proposed feature extraction method with OC-SVM-Neg outperforms binary classifiers on the false alarm rate and the generalization ability.

Keywords

Malware detection Behavior feature extraction Machine learning One-class classification 

Notes

Acknowledgments

The authors also would like to thank the reviewers for their valuable comments and important suggestions. Many thanks to Dr. Ben Stock at University of Erlangen-Nuremberg for his kind help of sharing many useful malware samples with us. The work was jointly supported by the National Natural Science Foundations of China under Grant No. 61472302, 61272280, U1404620, 41271447 and 61272195; The Program for New Century Excellent Talents in University under Grant No. NCET-12-0919; The Fundamental Research Funds for the Central Universities under Grant No. K5051203020, K50513- 03016, K5051303018, BDY081422 and K50513100006; Natural Science Foundation of Shaanxi Province, under Grant No. 2014JM8310; The Creative Project of the Science and Technology State of Xi’an under Grant No. CXY1341(6) and CXY1440(1) The State Key Laboratory of Geo-information Engineering under Grant No. SKLGIE2014-M-4-4.

References

  1. 1.
    Fossi, M., Egan, G., Haley, K., Johnson, E., Mack, T., Adams, T., Blackbird, J., Low, M.K., Mazurek, D., Kinney, D.: Symantec internet security threat report, vol. 16. Symantec Corporation (2011)Google Scholar
  2. 2.
    Wood, P., Egan, G., Haley, K., Tran, T., Cox, O., Lau, H., Wueest, C., McKinney, D., Millington, T., Nahorney, B., Mulcahy, J.: Symantec internet security threat report, vol. 17. Symantec Corporation (2012)Google Scholar
  3. 3.
    Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. 44(2), 1–49 (2012)CrossRefGoogle Scholar
  4. 4.
    Wang, X., Yu, W., Champion, A., Fu, X., Xuan, D.: Detecting worms via mining dynamic program execution. In: Proceedings of the 3rd International Conference on Security and Privacy in Communications Networks, pp. 412–421 (2007)Google Scholar
  5. 5.
    Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Sec. 19(4), 639–668 (2011)CrossRefGoogle Scholar
  6. 6.
    Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of the 1st India Software Engineering Conference, pp. 5–14 (2008)Google Scholar
  7. 7.
    Martignoni, L., Stinson, E., Fredrikson, M., Jha, S., Mitchell, J.: A layered architecture for detecting malicious behaviors. In: Proceedings of the 11th International Symposium on Recent Advances in Intrusion Detection, pp. 78–97 (2008)Google Scholar
  8. 8.
    Ye, Y., Li, T., Huang, K., Jiang, Q., Chen, Y.: Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list. J. Intell. Inf. Syst. 35(1), 1–20 (2010)CrossRefGoogle Scholar
  9. 9.
    Firdausi, I., Lim, C., Erwin, A., Nugroho, A.S.: Analysis of machine learning techniques used in behavior-based malware detection. In: Proceedings of the 2nd International Conference on Advances in Computing, Control and Telecommunication Technologies, pp. 201–203 (2010)Google Scholar
  10. 10.
    Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Proceedings of the 28th Annual International Computer Software and Applications Conference, pp. 41–42 (2004)Google Scholar
  11. 11.
    Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Perdisci, R., Lanzi, A., Lee, W.: McBoost: Boosting scalability in malware collection and analysis using statistical classification of executables. In: Proceedings of the 24th Annual Computer Security Applications Conference, pp. 301–310 (2008)Google Scholar
  13. 13.
    Tahan, G., Rokach, L., Shahar, Y.: Mal-ID: automatic malware detection using common segment analysis and meta-features. J. Mach. Learn. Res. 13, 949–979 (2012)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Moskovitch, R., Feher, C., Tzachar, N., Berger, E., Gitelman, M., Dolev, S., Elovici, Y.: Unknown malcode detection using opcode representation. In: Daniel O., Henrik L, Daniel Z, David H, Gerhard W. (eds.) Intelligence and Security Informatics. pp. 204–215 (2008)Google Scholar
  15. 15.
    Adkins, F., Jones, L., Carlisle, M., Upchurch, J.: Heuristic malware detection via basic block comparison. In: Proceedings of 8th International Conference on Malicious and Unwanted Software, pp. 11–18 (2013)Google Scholar
  16. 16.
    Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inform. Sci. 231(10), 64–82 (2013)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Lakhotia, A., Walenstein, A., Miles, C., Singh, A.: VILO: a rapid learning nearest-neighbor classifier for malware triage. J. Comput. Virol. 9(3), 109–123 (2013)Google Scholar
  18. 18.
    Huda, S., Abawajy, J., Alazab, M., Abdollalihian, M., Islam, R., Yearwood, J.: Hybrids of support vector machine wrapper and filter based framework for malware detection. Future Gener. Comput. Syst. (2014). doi: 10.1016/j.future.2014.06.001 Google Scholar
  19. 19.
    Park, Y., Reeves, D., Mulukutla, V., Sundaravel, B.: Fast malware classification by automated behavioral graph matching. In: Proceedings of the 6th Annual Workshop on Cyber Security and Information Intelligence Research, pp. 1–4 (2010)Google Scholar
  20. 20.
    Hu, X., Chiueh, T., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications security, pp. 611–620 (2009)Google Scholar
  21. 21.
    Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X., Wang, X.F.: Effective and efficient malware detection at the end host. In: Proceedings of the 18th Conference on USENIX Security Symposium, pp. 351–366 (2009)Google Scholar
  22. 22.
    Cao, Y., Miao, Q., Liu, J., Gao, L.: Abstracting minimal security-relevant behaviors for malware analysis. J. Comput. Virol. 9(4), 193–204 (2013)Google Scholar
  23. 23.
    Alazab, M., Venkatraman, S., Watters, P., Alazab, M.: Zero-day malware detection based on supervised learning algorithms of API call signatures. In: Proceedings of the 9th Australasian Data Mining Conference, pp. 171–182 (2011)Google Scholar
  24. 24.
    Firdausi, I., Lim, C., Erwin, A., Nugroho, A.S.: Analysis of machine learning techniques used in behavior-based malware detection. In: Proceedings of 2nd International Conference on Advances in Computing, Control and Telecommunication Technologies, pp. 201–203 (2010)Google Scholar
  25. 25.
    Natani, P., Vidyarthi, D.: Malware detection using API function frequency with ensemble based classifier. In: Proceedings of International Symposium on Security in Computing and Communications, pp. 378–388 (2013)Google Scholar
  26. 26.
    Sheen, S., Anitha, R., Sirisha, P.: Malware detection by pruning of parallel ensembles using harmony search. Pattern Recognit. Lett. 34(14), 1679–1686 (2013)CrossRefGoogle Scholar
  27. 27.
    Uppal, D., Sinha, R., Mehra, V., Jain, V.: Malware detection and classification based on extraction of API sequences. In: Proceedings of 3rd International Conference on Advances in Computing, Communications and Informatics, pp. 2337–2342 (2014)Google Scholar
  28. 28.
    Cheng, J.Y., Tsai, T., Yang, C.: An information retrieval approach for malware classification based on Windows API calls. In: Proceedings of 5th International Conference on Machine Learning and Cybernetics, pp. 1678–1683 (2013)Google Scholar
  29. 29.
    Gavrilut, D., Benchea, R., Vatamanu, C.: Optimized zero false positives perceptron training for malware detection. In: Proceedings of the 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 247–253 (2012)Google Scholar
  30. 30.
    Islam, R., Tian, R., Batten, L.M., Versteeg, S.: Classification of malware based on integrated static and dynamic features. J Netw. Comput. Appl. 34(2), 646–656 (2013)CrossRefGoogle Scholar
  31. 31.
    Santos, I., Devesa, J., Brezo, F., Nieves, J., Bringas, P.G.: OPEM: a static-dynamic approach for machine-learning-based malware detection. In: Proceedings of International Joint Conference CISIS’12-ICEUTE’12-SOCO’12, pp. 271–280 (2012)Google Scholar
  32. 32.
    Anderson, B., Storlie, C., Lane, T.: Improving malware classification: bridging the static/dynamic gap. In: Proceedings of 5th ACM Workshop on Security and Artificial Intelligence, pp. 3–14 (2012)Google Scholar
  33. 33.
    Liu, J., Song, J., Miao, Q., Cao, Y.: FENOC: an ensemble one-class learning framework for malware detection. In: Proceedings of 9th International Conference on Computational Intelligence and Security, pp. 523–527 (2013)Google Scholar
  34. 34.
    Kong, D., Yan, G.: Discriminant malware distance learning on structural information for automated malware classification. In: Proceedings of 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1357–1365 (2013)Google Scholar
  35. 35.
    Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X., Wang, X.F.: Effective and efficient malware detection at the end host. In: Proceedings of 18th Conference on USENIX Security Symposium, pp. 351–366 (2009)Google Scholar
  36. 36.
    Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of 1st India Software Engineering Conference, pp. 5–14 (2008)Google Scholar
  37. 37.
    Cao, Y., Miao, Q., Liu, J., Li, W.: Osiris: a malware behavior capturing system implemented at virtual machine monitor layer. Math. Probl. Eng. (2013). doi: 10.1155/2013/402438 Google Scholar
  38. 38.
    Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)CrossRefGoogle Scholar
  39. 39.
    Tax, D.M.J.: One-class classification. Ph.D. dissertation, Delft University of Technology (2001)Google Scholar
  40. 40.
    Spathoulas, G.P., Katsikas, S.K.: Reducing false positives in intrusion detection systems. Comput. Secur. 29(1), 35–44 (2010)CrossRefGoogle Scholar
  41. 41.
    Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)CrossRefzbMATHGoogle Scholar
  42. 42.
    Bernhard, S., Platt, J.C., Smola, A.J.: Kernel method for percentile feature extraction. Microsoft technical report, pp. 2000–2022 (2000)Google Scholar
  43. 43.
    Dinaburg, A., Royal, P., Sharif, M., Lee, W.: Ether: malware analysis via hardware virtualization extensions. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, pp. 51–62 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Qiguang Miao
    • 1
  • Jiachen Liu
    • 1
  • Ying Cao
    • 1
  • Jianfeng Song
    • 1
  1. 1.School of Computer Science and TechnologyXidian UniversiyXi’anChina

Personalised recommendations