Multi-context features for detecting malicious programs

  • Moustafa SalehEmail author
  • Tao Li
  • Shouhuai Xu
Original Paper


Malware detection is still an open problem. There are numerous attacks that take place every day where malware is used to steal private information, disrupt services, or sabotage industrial systems. In this paper, we combine three kinds of contextual information, namely static, dynamic, and instruction-based, for malware detection. This leads to the definition of more than thirty thousand features, which is a large features set that covers a wide range of a sample characteristics. Through experiments with one million files, we show that this features set leads to machine learning based models that can detect both malware seen roughly at the time when the models are built, and malware first seen even months after the models were built (i.e., the detection models remain effective months ahead of time). This may be due to the comprehensiveness of the features set.


Malware Detection Machine Learning Code Obfuscation 



We thank VirusTotal for providing us the dataset that is analyzed in the present paper. We also thank John Charlton for proofreading the paper. The research was supported in part by ARO Grant #W911NF-13-1-0141, NSF Grants #1111925, #IIS-1213026 and #CNS-1461926.


  1. 1.
    Ahmadi, M., Giacinto, G., Ulyanov, D., Semenov, S., Trofimov, M.: Novel feature extraction, selection and fusion for effective malware family classification. ArXiv e-prints (2015)Google Scholar
  2. 2.
    Ahmed, F., Hameed, H., Shafiq, M.Z., Farooq, M.: Using spatio-temporal information in api calls with machine learning algorithms for malware detection. In: Proceedings of the 2Nd ACM Workshop on Security and Artificial Intelligence, AISec ’09, pp. 55–62. ACM, New York, NY, USA (2009). doi: 10.1145/1654988.1655003
  3. 3. PEiD. Accessed: Feb. 8th, 2014
  4. 4.
    Anderson, B., Storlie, C., Lane, T.: Improving malware classification: bridging the static/dynamic gap. In: Proceedings of the 5th ACM workshop on Security and artificial intelligence, pp. 3–14. ACM (2012)Google Scholar
  5. 5.
    AV-Comparative: File detection test of malicious software. (March 2015)Google Scholar
  6. 6.
    CNET: lenovo hit by lawsuit over superfish adware. Accessed 9 December 2015
  7. 7.
    Demme, J., Maycock, M., Schmitz, J., Tang, A., Waksman, A., Sethumadhavan, S., Stolfo, S.: On the feasibility of online malware detection with performance counters. SIGARCH Comput. Archit. News 41(3), 559–570 (2013). doi: 10.1145/2508148.2485970 CrossRefGoogle Scholar
  8. 8.
    Ding, Y., Dai, W., Yan, S., Zhang, Y.: Control flow-based opcode behavior analysis for malware detection. Computers & Security 44, 65–74 (2014). doi: 10.1016/j.cose.2014.04.003.
  9. 9.
    Hiramoto, K.: Technical account manager at VirusTotal. Personal Communication. Sept. 24th, 2014Google Scholar
  10. 10.
    Huang, J., Zhang, X., Tan, L., Wang, P., Liang, B.: Asdroid: Detecting stealthy behaviors in android applications by user interface and program behavior contradiction. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pp. 1036–1046. ACM, New York, NY, USA (2014). doi: 10.1145/2568225.2568301
  11. 11.
    Kang, B., Han, K.S., Kang, B., Im, E.G.: Malware categorization using dynamic mnemonic frequency analysis with redundancy filtering. Digit. Investig. 11(4), 323–335 (2014). doi: 10.1016/j.diin.2014.06.003.
  12. 12.
    Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pp. 470–478. ACM, New York, NY, USA (2004). doi: 10.1145/1014052.1014105
  13. 13.
    Kompalli, S.: Using existing hardware services for malware detection. In: Security and Privacy Workshops (SPW), 2014 IEEE, pp. 204–208. IEEE (2014)Google Scholar
  14. 14.
    Labs, K.: The great bank robbery: the carbanak apt. Accessed 25 Mar 2015
  15. 15.
    Labs, M.: Mcafee labs threats report for february 2015. Accessed 25 Mar 2015
  16. 16.
    M0SA: Syp.01: Bypassing online dynamic analysis systems. Valhalla ezine, issue #4, November 2013.
  17. 17.
    Martinez, E.: Software engineer at VirusTotal. Personal Communication. Dec. 25th, 2014Google Scholar
  18. 18.
    Miao, Q., Liu, J., Cao, Y., Song, J.: Malware detection using bilayer behavior abstraction and improved one-class support vector machines. Int. J. Inf. Secur. 15(14), 1–19 (2015). doi: 10.1007/s10207-015-0297-6 Google Scholar
  19. 19.
    Microsoft: Microsoft pe and coff specification. Accessed 20 Nov 2015
  20. 20.
    pefile: Accessed 6 June 2015
  21. 21.
    Perdisci, R., Lanzi, A., Lee, W.: Classification of packed executables for accurate computer virus detection. Pattern Recogn. Lett. 29(14), 1941–1946 (2008). doi: 10.1016/j.patrec.2008.06.016 CrossRefGoogle Scholar
  22. 22.
    Quist, D., Smith, V., Computing, O.: Detecting the presence of virtual machines using the local data table. Offens. Comput. (2006)Google Scholar
  23. 23.
    Ravula, R.R., Liszka, K.J., Chan, C.C.: Learning attack features from static and dynamic analysis of malware. In: Knowledge Discovery, Knowledge Engineering and Knowledge Management, pp. 109–125. Springer (2013)Google Scholar
  24. 24.
    Saleh, M., Ratazzi, E., Xu, S.: Instructions-based detection of sophisticated obfuscation and packing. In: Military Communications Conference (MILCOM), 2014 IEEE, pp. 1–6 (2014). doi: 10.1109/MILCOM.2014.9
  25. 25.
    Saleh, M.E., Mohamed, A.B., Nabi, A.A.: Eigenviruses for metamorphic virus recognition. IET Inf. Secur. 5(4), 191–198 (2011)CrossRefGoogle Scholar
  26. 26.
    Salehi, Z., Sami, A., Ghiasi, M.: Using feature generation from API calls for malware detection. Comput. Fraud Secur. 2014(9), 9–18 (2014)CrossRefGoogle Scholar
  27. 27.
    Sandbox, C.: Cuckoo sandbox: automated malware analysis. Accessed 6 June 2015Google Scholar
  28. 28.
    Santos, I., Devesa, J., Brezo, F., Nieves, J., Bringas, P.G.: Opem: a static-dynamic approach for machine-learning-based malware detection. In: International Joint Conference CISIS12-ICEUTE’ 12-SOCO’ 12 Special Sessions, pp. 271–280. Springer (2013)Google Scholar
  29. 29.
    Santos, I., Ugarte-Pedrero, X., Sanz, B., Laorden, C., Bringas, P.G.: Collective classification for packed executable identification. In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS ’11, pp. 23–30. ACM, New York, NY, USA (2011). doi: 10.1145/2030376.2030379
  30. 30.
    Saxe, J., Berlin, K.: Deep neural network based malware detection using two dimensional binary program features. arXiv preprint arXiv:1508.03096 (2015)
  31. 31.
    Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings 2001 IEEE Symposium on Security and Privacy, 2001. S&P 2001, pp. 38–49. IEEE (2001)Google Scholar
  32. 32.
    Shafiq, M., Tabish, S., Farooq, M.: PE-probe: leveraging packer detection and structural information to detect malicious portable executables. In: Proceedings of the Virus Bulletin Conference (VB), pp. 29–33 (2009)Google Scholar
  33. 33.
    Shafiq, M., Tabish, S., Mirza, F., Farooq, M.: PE-Miner: Mining structural information to detect malicious executables in real-time. In: E. Kirda, S. Jha, D. Balzarotti (eds.) Recent Advances in Intrusion Detection. Lecture Notes in Computer Science, vol. 5758, pp. 121–141. Springer, Berlin Heidelberg (2009). doi: 10.1007/978-3-642-04342-0_7
  34. 34.
    Shahzad, F., Farooq, M.: Elf-miner: using structural knowledge and data mining methods to detect new (linux) malicious executables. Knowl. Inf. Syst. 30(3), 589–612 (2012). doi: 10.1007/s10115-011-0393-5 CrossRefGoogle Scholar
  35. 35.
    Storlie, C., Anderson, B., Vander Wiel, S., Quist, D., Hash, C., Brown, N.: Stochastic identification of malware with dynamic traces. ArXiv e-prints (2014)Google Scholar
  36. 36.
    Tang, A., Sethumadhavan, S., Stolfo, S.J.: Unsupervised anomaly-based malware detection using hardware features. CoRR arXiv:1403.1631 (2014)
  37. 37.
    Tian, R., Islam, M., Batten, L., Versteeg, S.: Differentiating malware from cleanware using behavioural analysis. In: 2010 5th International Conference on Malicious and Unwanted Software (MALWARE), pp. 23–30 (2010). doi: 10.1109/MALWARE.2010.5665796
  38. 38.
    Treadwell, S., Zhou, M.: A heuristic approach for detection of obfuscated malware. In: IEEE International Conference on Intelligence and Security Informatics, 2009 ISI ’09, pp. 291–299 (2009). doi: 10.1109/ISI.2009.5137328
  39. 39.
    UPX: Upx: The ultimate packer for executables. Accessed 7 Dec 2015
  40. 40.
    VirusTotal: Accessed 6 June 2015
  41. 41.
    Weka: Weka 3: Data mining software in Java. Accessed 6 June 2015
  42. 42.
    Yan, G., Brown, N., Kong, D.: Exploring discriminatory features for automated malware classification. In: Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 41–61. Springer (2013)Google Scholar
  43. 43.
    You, I., Yim, K.: Malware obfuscation techniques: a brief survey. In: BWCCA, pp. 297–300 (2010)Google Scholar
  44. 44.
    Zetter, K.: Countdown to Zero Day: Stuxnet and the Launch of the World’s First Digital Weapon. Crown Publishing Group, New York (2014)Google Scholar

Copyright information

© Springer-Verlag France SAS 2017

Authors and Affiliations

  1. 1.Microsoft Malware Protection CenterMicrosoftRedmondUSA
  2. 2.School of Computer ScienceFlorida International UniversityMiamiUSA
  3. 3.Department of Computer ScienceUniversity of Texas at San AntonioSan AntonioUSA

Personalised recommendations