Subroutine based detection of APT malware

  • Joseph Sexton
  • Curtis Storlie
  • Blake Anderson
Original Paper


Statistical detection of mass malware has been shown to be highly successful. However, this type of malware is less interesting to cyber security officers of larger organizations, who are more concerned with detecting malware indicative of a targeted attack. Here we investigate the potential of statistically based approaches to detect such malware using a malware family associated with a large number of targeted network intrusions. Our approach is complementary to the bulk of statistical based malware classifiers, which are typically based on measures of overall similarity between executable files. One problem with this approach is that a malicious executable that shares some, but limited, functionality with known malware is likely to be misclassified as benign. Here a new approach to malware classification is introduced that classifies programs based on their similarity with known malware subroutines. It is illustrated that malware and benign programs can share a substantial amount of code, implying that classification should be based on malicious subroutines that occur infrequently, or not at all in benign programs. Various approaches to accomplishing this task are investigated, and a particularly simple approach appears the most effective. This approach simply computes the fraction of subroutines of a program that are similar to malware subroutines whose likes have not been found in a larger benign set. If this fraction exceeds around 1.5 %, the corresponding program can be classified as malicious at a 1 in 1000 false alarm rate. It is further shown that combining a local and overall similarity based approach can lead to considerably better prediction due to the relatively low correlation of their predictions.


APT Malware detection Static analysis Subroutine similarity 



The authors would like to thank three reviewers whose comments resulted in a considerably improved manuscript.


  1. 1.
    Anderson, B., Quist, D., Neil, J., Storlie, C., Lane, T.: Graph-based malware detection using dynamic traces. J. Comput. Virol. 7, 247–258 (2011)CrossRefGoogle Scholar
  2. 2.
    Baysa, D., Low, R.M., Stamp, M.: Structural entropy and metamorphic malware. J. Comput. Virol. Hack. Tech. 9, 179–192 (2013)CrossRefGoogle Scholar
  3. 3.
    Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th USENIX Security Symposium, USENIX, pp. 169–186 (2003)Google Scholar
  4. 4.
    Deng, W., Liu, Q., Cheng, H., Qin, Z.: A malware detection framework based on Kolmogorov complexity. J. Comput. Inf. Syst. 7, 2687–2694 (2011)Google Scholar
  5. 5.
    Eagle, C.: The IDA Pro Book: The Unofficial Guide to the World’s Most Popular Disassembler, 2nd edn. No Starch Press, San Francisco (2011)Google Scholar
  6. 6.
    Efron, B.: Robbins, empirical Bayes and microarrays. Ann. Stat. 31, 366–378 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    FireEye.: Supply chain analysis: from quartermaster to SunShopFireEye. (2013). Accessed 19 Nov 2015
  8. 8.
    Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Soft. 33, 1–22 (2010)CrossRefGoogle Scholar
  9. 9.
    Iwamoto, K., Wasaki, K.: Malware classification based on extracted API sequences using static analysis. In: Proceedings of the Asian Internet Engineering Conference, ACM pp. 31–38 (2012)Google Scholar
  10. 10.
    Jidigam, R.K., Austin, T. H., Stamp, M.: Singular value decomposition and metamorphic detection. J. Comput. Virol. Hack. Tech. 11, 203–216 (2015)Google Scholar
  11. 11.
    Kinable, J., Kostakis, O.: Malware classification based on call graph clustering. J. Comput. Virol. 7, 233–245 (2011)CrossRefGoogle Scholar
  12. 12.
    Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Lai, A., Wu, B., Chiu, J.: Balancing the PWN trade deficit series: APT secrets in Asia. (2011). Accessed 19 Nov 2015
  14. 14.
    Loughin, T.M.: A systematic comparison of methods for combining p-values from independent tests. Comput. Stat. Data Anal. 47, 467–485 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Mandiant. APT1: exposing one of China’s cyber espionage units. (2013). Accessed 19 Nov 2015
  16. 16.
    Microsoft Corporation.: Microsoft security intelligence report, January-June 2006. (2006). Accessed 19 Nov 2015
  17. 17.
    Park, Y., Reeves, D., Stamp, M.: Deriving common malware behavior through graph clustering. Comput. Secur. 39, 419–430 (2013)CrossRefGoogle Scholar
  18. 18.
    Ridgeway, G.: gbm: Generalized Boosted Regression Models. R package version 2.1 (2013)Google Scholar
  19. 19.
    Reddy, D., Pujari, A.: N-gram analysis for computer virus detection. J. Comput. Virol. 2, 231–239 (2013)CrossRefGoogle Scholar
  20. 20.
    Runwal, N., Low, R.M., Stamp, M.: Opcode graph similarity and metamorphic detection. J. Comput. Virol. 8, 37–52 (2012)CrossRefGoogle Scholar
  21. 21.
    Ruttenberg, B., Miles, C., Kellog, L., Notani, V., Howard, M., LeDoux, C., Lakhtia, A., Pfeffer, A.: Identifying shared software components to support malware forensics. In: Detection of Intrusions and Malware, and Vulnerability Assessment pp. 21–40 (2014)Google Scholar
  22. 22.
    Santos, I., Breze, F., Nieves, J., Penya, Y.K., Sanz, B., Laorden, C., Bringas, P.G.: Idea: Opcode-sequence-based malware detection. In: Engineering Secure Software and Systems, pp. 35-43, Springer, Berlin (2010)Google Scholar
  23. 23.
    Toderici, A.H., Stamp, M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hack. Tech. 9, 159–170 (2013)CrossRefGoogle Scholar
  24. 24.
    Sikorski, M., Honig, A.: Practical Malware Analysis: A Hands-on Guide to Dissecting Malicious Software. No Starch Press, San Francisco (2012)Google Scholar
  25. 25.
    Sorokin, I.: Comparing files using structural entropy. J. Comput. Virol. 7, 259–265 (2011)CrossRefGoogle Scholar
  26. 26.
    Sridhara, S.M., Stamp, M.: Metamorphic worm that carries its own morphing engine. J. Comput. Virol. Hack. Tech. 9, 49–58 (2013)CrossRefGoogle Scholar
  27. 27.
    Storlie, C., Anderson, B., Vander Wiel, S., Quist, D., Hash, C., Brown, N.: Stochastic identification of malware with dynamic traces. Ann. Appl. Stat. 8, 1–18 (2014)Google Scholar
  28. 28.
    Tankard, C.: Persistent threats and how to monitor and deter them. Netw. Secur. 8, 16–19 (2011)CrossRefGoogle Scholar
  29. 29.
    Toderici, A.H., Stamp, M.: Chi-squared distance and metamorphic virus detection. J. Comput. Virol. Hack. Tech. 9, 1–14 (2013)Google Scholar
  30. 30.
    Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2, 211–229 (2006)Google Scholar
  31. 31.
    Xu, M., Wu, L., Qi, S., Xu, J., Zhang, H., Ren, Y., Zheng, N.: A similarity metric method for obfuscated malware using function-call graph. J. Comput. Virol. Hack. Tech. 9, 35–47 (2013)CrossRefGoogle Scholar
  32. 32.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B Meth. 67, 301–320 (2005)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag France (Outside the USA) 2015

Authors and Affiliations

  1. 1.Los Alamos National LaboratoryLos AlamosUSA

Personalised recommendations