Skip to main content

Pattern Recognition Techniques for the Classification of Malware Packers

  • Conference paper
Information Security and Privacy (ACISP 2010)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 6168))

Included in the following conference series:

Abstract

Packing is the most common obfuscation method used by malware writers to hinder malware detection and analysis. There has been a dramatic increase in the number of new packers and variants of existing ones combined with packers employing increasingly sophisticated anti-unpacker tricks and obfuscation methods. This makes it difficult, costly and time-consuming for anti-virus (AV) researchers to carry out the traditional static packer identification and classification methods which are mainly based on the packer’s byte signature.

In this paper, we present a simple, yet fast and effective packer classification framework that applies pattern recognition techniques on automatically extracted randomness profiles of packers. This system can be run without AV researcher’s manual input. We test various statistical classification algorithms, including k −Nearest Neighbor, Best-first Decision Tree, Sequential Minimal Optimization and Naive Bayes. We test these algorithms on a large data set that consists of clean packed files and 17,336 real malware samples. Experimental results demonstrate that our packer classification system achieves extremely high effectiveness (> 99%). The experiments also confirm that the randomness profile used in the system is a very strong feature for packer classification. It can be applied with high accuracy on real malware samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. The WildList Organization International: WildList, http://www.wildlist.org/

  2. Brosch, T., Morgenstern, M.: Runtime Packers: The hidden problem? Black Hat USA (2006), http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Morgenstern.pdf

  3. Bustamante, P.: Mal(ware)formation Statistics (2007), http://research.pandasecurity.com/malwareformation-statistics/

  4. Morgenstern, M., Marx, A.: Runtime Packer Testing Experiences. In: 2nd International CARO Workshop (2008), www.datasecurity-event.com/uploads/runtimepacker.ppt

  5. Ebringer, T., Sun, L., Boztaş, S.: A Fast Randomness Test that Preserves Local Detail. In: Proceedings of 18th Virus Bulletin International Conference, pp. 34–42 (2008)

    Google Scholar 

  6. Pietrek, M.: An In-depth Look into the Win32 Portable Executable File Format (2002), http://msdn.microsoft.com/msdnmag/issue/02/02/PE/print.asp

  7. Ferrie, P.: Anti-unpacker Tricks Current. In: 2nd International CARO Workshop (2008), http://www.datasecurity-event.com/uploads/unpackers.pdf

  8. Ferrie, P.: Anti-unpacker Tricks 2 Part One. Virus Bulletin, 4–8 (December 2008)

    Google Scholar 

  9. Ferrie, P.: Anti-unpacker Tricks 2 Part Two. Virus Bulletin, 4–9 (January 2009)

    Google Scholar 

  10. Ferrie, P.: Anti-unpacker Tricks 2 Part Three. Virus Bulletin, 4–9 (Febuary 2009)

    Google Scholar 

  11. Ferrie, P.: Anti-unpacker Tricks 2 Part Tour. Virus Bulletin, 4–7 (March 2009)

    Google Scholar 

  12. VMware workstation, http://www.vmware.com/products/ws/

  13. PEiD, http://www.peid.info/

  14. Carrera, E.: pefile, http://code.google.com/p/pefile/

  15. Kephart, J.O., Sorkin, G.B., Arnold, W.C., Chess, D.M., Tesauro, G.J., White, S.R.: Biologically Inspired Defenses against Computer Viruses. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 985–996 (1995)

    Google Scholar 

  16. Tesauro, G.J., Kephart, J.O., Sorkin, G.B.: Neural Networks for Computer Virus Recognition. IEEE Expert 11(4), 5–6 (1996)

    Article  Google Scholar 

  17. Siddiqui, M.A.: Data Mining Methods for Malware Detection. Master’s thesis, University of Central Florida, Orlando (2008)

    Google Scholar 

  18. Kolter, J.Z., Maloof, M.A.: Learning to Detect and Classify Malicious Executables in the Wild. JMLR 7, 2699–2720 (2006)

    Google Scholar 

  19. Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data Mining Methods for Detection of New Malicious Executables. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 38–49 (2001)

    Google Scholar 

  20. Cohen, W.W.: Learning Rules that Classify E-mail. In: Proceedings of the AAAI Spring Symposium on Machine Learning in Information Access, pp. 18–25 (1996)

    Google Scholar 

  21. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-mail. AAAI Technical Report WS-98-05, pp. 55–62 (1998)

    Google Scholar 

  22. Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C.D., Stamatopoulos, P.: Learning to Filter Spam E-mail: A Comparison of a Naive Bayesian and a Memory-based Approach. In: Proceedings of Workshop on Machine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 1–13 (2000)

    Google Scholar 

  23. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An Experimental Comparison of Naive Bayesian and Keyword-based Anti-spam Filtering with Encrypted Personal Messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 160–167 (2000)

    Google Scholar 

  24. Perdisci, R., Lanzi, A., Lee, W.: Classification of Packed Executables for Accurate Computer Virus Detection. Pattern Recognition Letters 29(14), 1941–1946 (2008)

    Article  Google Scholar 

  25. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Book Co., New York (1983)

    MATH  Google Scholar 

  26. Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)

    Google Scholar 

  27. van Rijsbergen, C.J.: Information Retrieval, Butterworths (1979)

    Google Scholar 

  28. Syring, K.M.: GNU Utilities for Win32 (2004), http://unxutils.sourceforge.net/

  29. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  30. Holmes, G., Donkin, A., Witten, I.H.: Weka: A Machine Learning Workbench. In: Proceedings of 2nd Australia and New Zealand Conference on Intelligent Information Systems, Brisbane, Australia (1994)

    Google Scholar 

  31. Weka, http://www.cs.waikato.ac.nz/~ml/weka/

  32. Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: IJCAI, pp. 1137–1145 (1995)

    Google Scholar 

  33. Chou, Y.Y., Shapiro, L.G.: A Hierarchical Multiple Classifier Learning Algorithm. In: Proceedings of 15th International Conference on Pattern Recognition (ICPR 2000), vol. 2, pp. 2152–2155 (2000)

    Google Scholar 

  34. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, Inc., London (2006)

    Google Scholar 

  35. Zhang, H.: The Optimality of Naive Bayes. In: FLAIRS Conf. (2004)

    Google Scholar 

  36. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based Learning Algorithms. Machine Learning 6(1), 37–66 (1991)

    Google Scholar 

  37. Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998)

    Article  Google Scholar 

  38. Platt, J.C.: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Microsoft Research (1998)

    Google Scholar 

  39. Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  40. Shi, H.J.: Best-first Decision Tree Learning. Master’s thesis, The University of Waikato (2007)

    Google Scholar 

  41. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  42. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Wadsworth, Monterey (1984)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sun, L., Versteeg, S., Boztaş, S., Yann, T. (2010). Pattern Recognition Techniques for the Classification of Malware Packers. In: Steinfeld, R., Hawkes, P. (eds) Information Security and Privacy. ACISP 2010. Lecture Notes in Computer Science, vol 6168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14081-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14081-5_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14080-8

  • Online ISBN: 978-3-642-14081-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics