A New Malware Classification Approach Based on Malware Dynamic Analysis

  • Ying Fang
  • Bo Yu
  • Yong Tang
  • Liu Liu
  • Zexin Lu
  • Yi Wang
  • Qiang Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10343)


Dynamic analysis plays an important role in analyzing malware variants which have used obfuscation, polymorphism and metamorphism techniques. Malware classification is an emerging approach for discriminating different malware families. However, existing malware classification methods have mediocre performance in small scale datasets and some machine learning algorithms have difficulties in handling imbalanced datasets. To solve these issues, we propose an ensemble learning based dynamic malware classification approach aiming at datasets of different scales. Additionally a novel feature selection method is presented to select features with strong discrimination power. In particular, we continue to explore issues in feature representation and feature selection. To verify the efficiency of our approach, we perform a series of comparative experiments with existing feature selection methods, commercial anti-malware tools and current malware classification techniques. The experimental results demonstrate that our approach can classify malware variants in high F1-score while imposing low classification time in datasets of different scales.


Malware classification Ensemble learning Feature selection TF-IDF 



Project supported by the National Natural Science Foundation of China (No. 61472437 and No. 61379148)


  1. 1.
    Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 108–125. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-70542-0_6 CrossRefGoogle Scholar
  2. 2.
    Liu L., Wang, B.-S., Yu, B., Zhong, Q.-X.: Automatic malware classification and new malware detection using machine learning. Frontiers of Information Technology & Electronic Engineering, pp. 1–12 (2016)Google Scholar
  3. 3.
    Damodaran, A., Di Troia, F., Visaggio, C.A., Austin, T.H., Stamp, M.: A comparison of static, dynamic, and hybrid analysis for malware detection. J. Comput. Virol. Hacking Tech. 13(1), 1–12 (2017)CrossRefGoogle Scholar
  4. 4.
    Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C.: Deep learning for classification of malware system call sequences. In: Kang, B.H., Bai, Q. (eds.) AI 2016. LNCS, vol. 9992, pp. 137–149. Springer, Cham (2016). doi: 10.1007/978-3-319-50127-7_11 CrossRefGoogle Scholar
  5. 5.
    Chouhan, P.K., Hagan, M., McWilliams, G., Sezer, S.: Network based malware detection within virtualised environments. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8805, pp. 335–346. Springer, Cham (2014). doi: 10.1007/978-3-319-14325-5_29 Google Scholar
  6. 6.
    Rhee, J., Riley, R., Dongyan, X., Jiang, X.: Kernel malware analysis with un-tampered and temporal views of dynamic kernel memory. In: Recent Advances in Intrusion Detection, International Symposium, pp. 178–197 (2010)Google Scholar
  7. 7.
    Witten, L.H., Frank, E., Hall, M.A., Pal, C.J., Mining, D.: Practical machine learning tools and techniques. Elsevier Ltd. (2011)Google Scholar
  8. 8.
    Shehata, H., Yousef, G., Mahdy, B., Ali, M.: Behavior-based features model for malware detection. J. Comput. Virol. Hacking Tech. 12(2), 59–67 (2016)CrossRefGoogle Scholar
  9. 9.
    Roelleke, T., Wang, J.: TFIDF uncovered: a study of theories and probabilities. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 435–442 (2008)Google Scholar
  10. 10.
    Aoyama, H.: On the chi-square test for weighted samples. Ann. Inst. Stat. Math. 5(1), 25–28 (1953)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Okane, P., Sezer, S., McLaughlin, K., Im, E.G.: SVM training phase reduction using dataset feature filtering for malware detection. IEEE Trans. Inf. Forensics Secur. 8(3), 500–509 (2013)CrossRefGoogle Scholar
  12. 12.
    Cesare, S., Xiang, Y., Member, S.: Control flow-based malware variant detection. IEEE Trans. Dependable Secure Comput. 11(4), 304–317 (2014)CrossRefGoogle Scholar
  13. 13.
    Liu, K., Shuai, L., Liu, C.: POSTER: fingerprinting the publicly available sandboxes. In: ACM SIGSAC Conference on Computer and Communications Security, pp. 1469–1471 (2014)Google Scholar
  14. 14.
    Cen, L., Gates, C.S., Si, L., Li, N.: A probabilistic discriminative model for android malware detection with decompiled source code. IEEE Trans. Dependable Secure Comput. 12(4), 400–412 (2015)CrossRefGoogle Scholar
  15. 15.
    Jain, Y.K., Bhandare, S.K.: Min max normalization based data perturbation method for privacy protection. Int. J. Comput. Commun. Technol. 2(8), 45–50 (2011)Google Scholar
  16. 16.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS. Springer, New York (2009)CrossRefMATHGoogle Scholar
  17. 17.
    Breiman, L.E.O.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefMATHGoogle Scholar
  18. 18.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine 1 function estimation 2 numerical optimization in function space. Ann. Stat. 29(5), 1189–1232 (2000)CrossRefGoogle Scholar
  19. 19.
    Ruta, D., Gabrys, B.: A theoretical analysis of the limits of majority voting errors for multiple classifier systems. Pattern Anal. Appl. 5(4), 333–350 (2002)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Gunawardana, A., Shani, G.: A survey of accuracy evaluation metrics of recommendation tasks. J. Mach. Learn. Res. 10, 2935–2962 (2009)MathSciNetMATHGoogle Scholar
  21. 21.
    Yang, Y., Liu, X.: A re-examination of text categorization methods. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)Google Scholar
  22. 22.
    Martin, D., Powers, W.: Evaluation: from precision, recall and F-measure to ROC, informendness, markendness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)Google Scholar
  23. 23.
    Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)CrossRefGoogle Scholar
  24. 24.
    Le Berre, S., Chevalier, A., Pourcelot, T.: Démarche d’analyse collaborative de codes malveillants. In: Symposium sur la sécurité des technologies de l’information et des communications, pp. 3–19 (2016)Google Scholar
  25. 25.
    Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Network and Distributed System Security Symposium, pp. 1–18 (2009)Google Scholar
  26. 26.
    Ding, Y., Dai, W., Yan, S., Zhang, Y.: Control flow-based opcode behavior analysis for Malware detection. Comput. Secur. 44(2), 65–74 (2014)CrossRefGoogle Scholar
  27. 27.
    Wuchner, T., Ochoa, M., Pretschner, A.: Robust and effective malware detection through quantitative data flow graph metrics. Comput. Sci. 9148, 98–118 (2015)Google Scholar
  28. 28.
    Rahbarinia, B.: Efficient and accurate behavior-based tracking of malware-control domains in large ISP networks. ACM Trans. Priv. Secur. 19(2), 1–31 (2016)CrossRefGoogle Scholar
  29. 29.
    Alam, S., Horspool, R.N., Traore, I., Sogukpinar, I.: A framework for metamorphic malware analysis and real-time detection. Comput. Secur. 48, 212–233 (2015)CrossRefGoogle Scholar
  30. 30.
    Zhang, H., Yao, D.D., Ramakrishnan, N., Zhang, Z.: Causality reasoning about network events for detecting stealthy malware activities. Comput. Secur. 58(C), 180–198 (2016)CrossRefGoogle Scholar
  31. 31.
    Zhang, M., Duan, Y., Yin, H., Zhao, Z.: Semantics-aware android malware classification using weighted contextual API dependency graphs. In: ACM SIGSAC Conference on Computer and Communications Security, pp. 1105–1116 (2014)Google Scholar
  32. 32.
    Naval, S., Laxmi, V., Rajarajan, M., Member, S.: Employing program semantics for malware detection. IEEE Trans. Inf. Forensics Secur. 10(12), 2591–2604 (2015)CrossRefGoogle Scholar
  33. 33.
    Zhao, Z., Wang, J., Bai, J.: Malware detection method based on the control-flow construct feature of software. IET Inf. Secur. 8(1), 18–24 (2014)CrossRefGoogle Scholar
  34. 34.
    Moonsamy, V., Tian, R., Batten, L.: Feature reduction to speed up malware classification. In: Nordic Conference on Secure IT Systems, pp. 176–188 (2011)Google Scholar
  35. 35.
    Watson, M.R., Marnerides, A.K., Shirazi, N., Mauthe, A., Hutchison, D.: Malware detection in cloud computing infrastructures. IEEE Trans. Dependable Secure Comput. 13(2), 192–205 (2016)CrossRefGoogle Scholar
  36. 36.
    Mohaisen, A., Alrawi, O.: AMAL: high-fidelity, behavior-based automated malware analysis and classification. In: Rhee, K.-H., Yi, J.H. (eds.) WISA 2014. LNCS, vol. 8909, pp. 107–121. Springer, Cham (2015). doi: 10.1007/978-3-319-15087-1_9 Google Scholar
  37. 37.
    Yerima, S., Sezer, S., Muttik, I.: High accuracy android malware detection using ensemble learning. IET Inf. Secur. 9(6), 313–320 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Ying Fang
    • 1
  • Bo Yu
    • 1
  • Yong Tang
    • 1
  • Liu Liu
    • 1
  • Zexin Lu
    • 1
  • Yi Wang
    • 1
  • Qiang Yang
    • 1
  1. 1.National University of Defense TechnologyChangshaChina

Personalised recommendations