Journal in Computer Virology

, Volume 7, Issue 4, pp 247–258 | Cite as

Graph-based malware detection using dynamic analysis

  • Blake Anderson
  • Daniel Quist
  • Joshua Neil
  • Curtis Storlie
  • Terran Lane
Original paper

Abstract

We introduce a novel malware detection algorithm based on the analysis of graphs constructed from dynamically collected instruction traces of the target executable. These graphs represent Markov chains, where the vertices are the instructions and the transition probabilities are estimated by the data contained in the trace. We use a combination of graph kernels to create a similarity matrix between the instruction trace graphs. The resulting graph kernel measures similarity between graphs on both local and global levels. Finally, the similarity matrix is sent to a support vector machine to perform classification. Our method is particularly appealing because we do not base our classifications on the raw n-gram data, but rather use our data representation to perform classification in graph space. We demonstrate the performance of our algorithm on two classification problems: benign software versus malware, and the Netbull virus with different packers versus other classes of viruses. Our results show a statistically significant improvement over signature-based and other machine learning-based detection methods.

References

  1. 1.
    Aspack software. http://www.aspack.com/asprotect.html, Accessed 5 August 2010
  2. 2.
    Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality, and the smo algorithm. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML’04, p. 6. ACM, New York (2004)Google Scholar
  3. 3.
    Ben-Hur, A.: Pyml: machine learning in python. http://pyml.sourceforge.net/, Accessed 28 July 2010
  4. 4.
    Bishop C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)Google Scholar
  5. 5.
    Bruschi, D., Martignoni, L., Monga, M.: Detecting self-mutating malware using control-flow graph matching. In: Bschkes, R., Laskov, P. (eds.) Detection of Intrusions and Malware and Vulnerability Assessment. Lecture Notes in Computer Science, vol. 4064, pp. 129–143. Springer, Berlin (2006)Google Scholar
  6. 6.
    Burges C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)CrossRefGoogle Scholar
  7. 7.
    Cardie, C., Nowe, N.: Improving minority class prediction using case-specific feature weights. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML’97, pp. 57–65. Morgan Kaufmann Publishers Inc, San Francisco (1997)Google Scholar
  8. 8.
    Cesare, S., Xiang, Y.: Classification of malware using structured control flow. In: Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing, vol. 107, AusPDC ’10, pp. 61–70. Australian Computer Society Inc, Darlinghurst (2010)Google Scholar
  9. 9.
    Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: In Proceedings of the 12th USENIX Security Symposium, pp. 169–186 (2003)Google Scholar
  10. 10.
    Chung, F.R.K.: Spectral Graph Theory (CBMS Regional Conference Series in Mathematics, No. 92). American Mathematical Society, Providence (1997)Google Scholar
  11. 11.
    Dai J., Guha R., Lee J.: Efficient virus detection using dynamic instruction sequences. J. Comput. 4(5), 405–414 (2009)Google Scholar
  12. 12.
    Dinaburg, A., Royal, P., Sharif, M., Lee, W.: Ether: malware analysis via hardware virtualization extensions. In: Proceedings of the 15th ACM conference on Computer and communications security, CCS ’08, pp. 51–62. ACM, New York (2008)Google Scholar
  13. 13.
    UPX: The Ultimate Packer for eXecutables. http://upx.sourceforge.net/, Accessed 16 August 2010
  14. 14.
    Hotelling H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417–441 (1933)CrossRefGoogle Scholar
  15. 15.
    Hu, X., Chiueh, T.-c., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS’09, pp. 611–620. ACM, New York (2009)Google Scholar
  16. 16.
    Lee, Y.J., Mangasarian, O.L.: Rsvm: reduced support vector machines. In: Data Mining Institute, Computer Sciences Department, University of Wisconsin, pp. 00–07 (2001)Google Scholar
  17. 17.
    Karim Md, Walenstein A., Lakhotia A., Parida L.: Malware phylogeny generation using permutations of code. J. Comput. Virol. 1, 13–23 (2005)CrossRefGoogle Scholar
  18. 18.
    Kashima H., Tsuda K., Inokuchi A.: Kernels for Graphs. MIT Press, Massachusetts (2004)Google Scholar
  19. 19.
    Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 470–478. ACM, New York (2004)Google Scholar
  20. 20.
    Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) Recent Advances in Intrusion Detection. Lecture Notes in Computer Science, vol. 3858, pp. 207–226. Springer, Berlin (2006)Google Scholar
  21. 21.
    Lawton, K., Denney, B., Guarneri, N.D., Ruppert, V., Bothamy, C.: Bochs user manual. Online user manual, November 2010Google Scholar
  22. 22.
    Luxburg U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Microsoft, Inc. IsDebuggerPresent function. http://msdn.microsoft.com/en-us/library/ms680345(VS.85).aspx, October 2010
  24. 24.
    Organisation for Economic Co-operation and Development. Malicious software (malware): A security threat to the internet economy. White Paper, June 2008Google Scholar
  25. 25.
    Panda Security. Panda labs annual report 2009. White Paper, January 2010Google Scholar
  26. 26.
    Quist, D., Liebrock, L., Neil, J.: Improving antivirus accuracy with hypervisor assisted analysis. J. Comput. Virol 1–11 (2010). doi:10.1007/s11416-010-0142-4
  27. 27.
    Reddy, D., Dash, S., Pujari, A.: New malicious code detection using variable length n-grams. In: Information Systems Security. Lecture Notes in Computer Science, vol. 4332, pp. 276–288. Springer, Berlin (2006)Google Scholar
  28. 28.
    Reddy D., Pujari A.: N-gram analysis for computer virus detection. J. Comput. Virol. 2, 231–239 (2006)CrossRefGoogle Scholar
  29. 29.
    Rieck, K., Holz, T., Willems, C., Dssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed) Detection of Intrusions and Malware, and Vulnerability Assessment. Lecture Notes in Computer Science, vol. 5137, pp. 108–125. Springer, Berlin (2008)Google Scholar
  30. 30.
    Wang, K., Stolflo, S.J., Li, W.J.: Fileprint analysis for malware detection. In: ACM CCS WORM (2005)Google Scholar
  31. 31.
    Schölkopf B., Smola A.J.: Learning with Kernels. MIT Press, Massachusetts (2002)Google Scholar
  32. 32.
    Shafiq, M., Khayam, S., Farooq, M.: Embedded malware detection using markov n-grams. In: Detection of Intrusions and Malware, and Vulnerability Assessment. Lecture Notes in Computer Science, vol. 5137, pp. 88–107. Springer, Berlin (2008)Google Scholar
  33. 33.
    Shankarapani, M., Ramamoorthy, S., Movva, R., Mukkamala, S.: Malware detection using assembly and api call sequences. J. Comput. Virol. pp. 1–13 (2010). doi:10.1007/s11416-010-0141-5
  34. 34.
    RDGMax Software. RDG Tejon Crypter. Software package, November 2010Google Scholar
  35. 35.
    Sonnenburg, S., Raetsch, G., Schaefer, C.: A general and efficient multiple kernel learning algorithm (2006)Google Scholar
  36. 36.
    Stolfo, S., Wang, K., Li, W.J.: Towards stealthy malware detection. In: Malware Detection. Advances in Information Security, vol. 27, pp. 231–249. Springer, Berlin (2007)Google Scholar
  37. 37.
    Wagner, C., Wagener, G., State, R., Engel, T.: Malware analysis with graph kernels and support vector machines. In: Malicious and Unwanted Software (MALWARE), 2009 4th International Conference, pp. 63–68 (2009)Google Scholar
  38. 38.
    Walenstein, A., Venable, M., Hayes, M., Thompson, C., Lakhotia, A.: Exploiting similarity between variants to defeat malware (2008)Google Scholar
  39. 39.
    Li, T., Ye, Y., Wang, D., Ye, D.: Imds: Intelligent malware detection system. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)Google Scholar

Copyright information

© Springer-Verlag France 2011

Authors and Affiliations

  • Blake Anderson
    • 1
  • Daniel Quist
    • 1
  • Joshua Neil
    • 1
  • Curtis Storlie
    • 1
  • Terran Lane
    • 2
  1. 1.Los Alamos National LabLos AlamosUSA
  2. 2.The University of New MexicoAlbuquerqueUSA

Personalised recommendations