Learning and Classification of Malware Behavior

  • Konrad Rieck
  • Thorsten Holz
  • Carsten Willems
  • Patrick Düssel
  • Pavel Laskov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5137)

Abstract

Malicious software in form of Internet worms, computer viruses, and Trojan horses poses a major threat to the security of networked systems. The diversity and amount of its variants severely undermine the effectiveness of classical signature-based detection. Yet variants of malware families share typical behavioral patterns reflecting its origin and purpose. We aim to exploit these shared patterns for classification of malware and propose a method for learning and discrimination of malware behavior. Our method proceeds in three stages: (a) behavior of collected malware is monitored in a sandbox environment, (b) based on a corpus of malware labeled by an anti-virus scanner a malware behavior classifier is trained using learning techniques and (c) discriminative features of the behavior models are ranked for explanation of classification decisions. Experiments with different heterogeneous test data collected over several months using honeypots demonstrate the effectiveness of our method, especially in detecting novel instances of malware families previously not recognized by commercial anti-virus software.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Avira. AntiVir PersonalEdition Classic (2007), http://www.avira.de/en/products/personal.html
  3. 3.
    Baecher, P., Koetter, M., Holz, T., Dornseif, M., Freiling, F.C.: The Nepenthes Platform: An Efficient Approach to Collect Malware. In: Zamboni, D., Krügel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 165–184. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated Classification and Analysis of Internet Malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  5. 5.
    Bayer, U., Kruegel, C., Kirda, E.: TTAnalyze: A tool for analyzing malware. In: Proceedings of EICAR 2006 (April 2006)Google Scholar
  6. 6.
    Bayer, U., Moser, A., Kruegel, C., Kirda, E.: Dynamic analysis of malicious code. Journal in Computer Virology 2, 67–77 (2006)CrossRefGoogle Scholar
  7. 7.
    Burges, C.: A tutorial on support vector machines for pattern recognition. Knowledge Discovery and Data Mining 2(2), 121–167 (1998)CrossRefGoogle Scholar
  8. 8.
    Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th USENIX Security Symposium, p. 12(2003)Google Scholar
  9. 9.
    Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) (2007)Google Scholar
  10. 10.
    Christodorescu, M., Jha, S., Seshia, S.A., Song, D.X., Bryant, R.E.: Semantics-aware malware detection. In: IEEE Symposium on Security and Privacy, pp. 32–46 (2005)Google Scholar
  11. 11.
    Egele, M., Kruegel, C., Kirda, E., Yin, H., Song, D.: Dynamic spyware analysis. In: Proceedings of USENIX Annual Technical Conference (June 2007)Google Scholar
  12. 12.
    Flake, H.: Structural comparison of executable objects. In: Proceedings of Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2004) (2004)Google Scholar
  13. 13.
    Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: Proceedings of the 15th USENIX Security Symposium, pp. 241–256 (2006)Google Scholar
  14. 14.
    Hunt, G.C., Brubacker, D.: Detours: Binary interception of Win32 functions. In: Proceedings of the 3rd USENIX Windows NT Symposium, pp. 135–143 (1999)Google Scholar
  15. 15.
    Jiang, X., Xu, D.: Collapsar: A VM-based architecture for network attack detention center. In: Proceedings of the 13th USENIX Security Symposium (2004)Google Scholar
  16. 16.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the European Conference on Machine Learning, pp. 137–142. Springer, Heidelberg (1998)Google Scholar
  17. 17.
    Joachims, T.: Learning to Classify Text using Support Vector Machines. Kluwer Academic Publishers, Dordrecht (2002)Google Scholar
  18. 18.
    Karim, M., Walenstein, A., Lakhotia, A., Laxmi, P.: Malware phylogeny generation using permutations of code. Journal in Computer Virology 1(1–2), 13–23 (2005)CrossRefGoogle Scholar
  19. 19.
    Kirda, E., Kruegel, C., Banks, G., Vigna, G., Kemmerer, R.A.: Behavior-based spyware detection. In: Proceedings of the 15th USENIX Security Symposium, p. 19 (2006)Google Scholar
  20. 20.
    Kolter, J., Maloof, M.: Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research 7, 2721–2744 (2006)MathSciNetGoogle Scholar
  21. 21.
    Kruegel, C., Robertson, W., Vigna, G.: Detecting kernel-level rootkits through binary analysis. In: Proceedings of the 20th Annual Computer Security Applications Conference (ACSAC) (2004)Google Scholar
  22. 22.
    Lee, T., Mody, J.J.: Behavioral classification. In: Proceedings of EICAR 2006 (April 2006)Google Scholar
  23. 23.
    Leita, C., Dacier, M., Massicotte, F.: Automatic Handling of Protocol Dependencies and Reaction to 0-Day Attacks with ScriptGen Based Honeypots. In: Zamboni, D., Krügel, C. (eds.) RAID 2006. LNCS, vol. 4219. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  24. 24.
    Moser, A., Kruegel, C., Kirda, E.: Exploring multiple execution paths for malware analysis. In: Proceedings of 2007 IEEE Symposium on Security and Privacy (2007)Google Scholar
  25. 25.
    Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Proceedings of the 23rd Annual Computer Security Applications Conference (ACSAC) (to appear, 2007)Google Scholar
  26. 26.
    Norman. Norman sandbox information center (accessed, 2007), http://sandbox.norman.no/
  27. 27.
    Platt, J.: Probabilistic outputs for Support Vector Machines and comparison to regularized likelihood methods. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers. MIT Press, Cambridge (2001)Google Scholar
  28. 28.
    Pouget, F., Dacier, M., Pham, V.H.: Leurre.com: on the advantages of deploying a large scale distributed honeypot platform. In: ECCE 2005, E-Crime and Computer Conference, March 29-30, Monaco (March 2005)Google Scholar
  29. 29.
    Rieck, K., Laskov, P.: Linear-time computation of similarity measures for sequential data. Journal of Machine Learning Research 9, 23–48 (2008)Google Scholar
  30. 30.
    Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)CrossRefMATHGoogle Scholar
  31. 31.
    Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  32. 32.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)Google Scholar
  33. 33.
    Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. Journal of Maching Learning Research 7, 1531–1565 (2006)Google Scholar
  34. 34.
    Szor, P.: The Art of Computer Virus Research and Defense. Addison-Wesley, Reading (2005)Google Scholar
  35. 35.
    Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)MATHGoogle Scholar
  36. 36.
    Virus Bulletin. AVK tops latest AV-Test charts (August 2007), http://www.virusbtn.com/news/2007/08_22a.xml
  37. 37.
    Vrable, M., Ma, J., Chen, J., Moore, D., Vandekieft, E., Snoeren, A.C., Voelker, G.M., Savage, S.: Scalability, fidelity, and containment in the potemkin virtual honeyfarm. SIGOPS Oper. Syst. Rev. 39(5), 148–162 (2005)CrossRefGoogle Scholar
  38. 38.
    Wagner, D., Soto, P.: Mimicry attacks on host based intrusion detection systems. In: Proceedings of the 9th ACM Conference on Computer and Communications Security (CCS 2002), pp. 255–264 (2002)Google Scholar
  39. 39.
    Willems, C., Holz, T., Freiling, F.: CWSandbox: Towards automated dynamic binary analysis. IEEE Security and Privacy 5(2) (2007)Google Scholar
  40. 40.
    Yin, H., Song, D., Egele, M., Kruegel, C., Kirda, E.: Panorama: Capturing system-wide information flow for malware detection and analysis. In: Proceedings of ACM Conference on Computer and Communication Security (October 2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Konrad Rieck
    • 1
  • Thorsten Holz
    • 2
  • Carsten Willems
    • 2
  • Patrick Düssel
    • 1
  • Pavel Laskov
    • 1
    • 3
  1. 1.Intelligent Data Analysis DepartmentFraunhofer Institute FIRSTBerlinGermany
  2. 2.Laboratory for Dependable Distributed SystemsUniversity of MannheimMannheimGermany
  3. 3.Wilhelm-Schickard-Institute for Computer ScienceUniversity of TübingenTübingenGermany

Personalised recommendations