Skip to main content

A Static, Packer-Agnostic Filter to Detect Similar Malware Samples

  • Conference paper
Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2012)

Abstract

The steadily increasing number of malware variants is a significant problem, clogging the input queues of automated analysis tools. The generation of malware variants is made easy by automatic packers and polymorphic engines, which produce by encryption and compression a multitude of distinct versions. A great deal of time and resources could be saved by prioritizing samples to analyze, either, to avoid the repeated analyses of variants and focus on innovative malware, or, on the contrary, to re-analyze variants and have better insights on their evolution. Unfortunately, indexing in malware analysis tools and repositories relies on executable digests (hashes) that strongly differ for each variant.

In this paper, we present a robust filter to quickly determine when a malware program is similar to a previously-seen sample. Compared to previous work, our similarity measure does not require the costly task of preliminary unpacking, but instead, operates directly on packed code. Our approach exploits the fact that current packers use compression and weak encryption schemes that do not break, in the packed versions, all the similarities existing between the original versions of two programs. In addition, we introduce a packer detection technique that is able to distinguish between different levels of protection, such as unpacked, compressed, encrypted, and multi-layer encrypted code. This allows us to optimize the sensitivity of the similarity measure accordingly. We evaluated our approach on a large malware repository containing 795,000 samples. Our results show that the similarity measure is highly effective in filtering out malware variants, even after re-packing, and can reduce the number of samples that need to be analyzed by a factor of 3 to 5.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ANUBIS, http://anubis.iseclab.org

  2. CWSandbox, http://www.mwanalysis.org

  3. Norman Sandbox, http://www.norman.com/technology/norman_sandbox/

  4. ThreatExpert, http://www.threatexpert.com

  5. VirusTotal, http://www.virustotal.com

  6. Bayer, U., Comparetti, P., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Proc. Symp. Network and Distributed System Security, NDSS (2009)

    Google Scholar 

  7. Carrera, E., Erdelyi, G.: Digital genome mapping. In: Virus Bulletin (2004)

    Google Scholar 

  8. Ebringer, T., Sun, L., Boztas, S.: A fast randomness test that preserves local detail. In: Virus Bulletin (2008)

    Google Scholar 

  9. Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: USENIX Security Symposium (2006)

    Google Scholar 

  10. Gheorghescu, M.: An automated virus classification system. In: Virus Bulletin (2005)

    Google Scholar 

  11. Hu, X., Chiueh, T., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proc. ACM Conf. Computer and Communications Security, CCS, pp. 611–620. ACM (2009)

    Google Scholar 

  12. Kang, M.G., Poosankam, P., Yin, H.: Renovo: a hidden code extractor for packed executables. In: Proc. ACM Workshop Recurring Malcode, WORM, pp. 46–53. ACM (2007)

    Google Scholar 

  13. Karnik, A., Goswami, S., Guha, R.: Detecting obfuscated viruses using cosine similarity analysis. In: Proc. Asia Int. Conf. Modelling & Simulation, AMS, pp. 165–170. IEEE Computer Society (2007)

    Google Scholar 

  14. Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic Worm Detection Using Structural Information of Executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Kruegel, C., Vigna, G.: Anomaly detection of web-based attacks. In: Proc. ACM Conf. Computer and Communications Security, CCS. ACM (2003)

    Google Scholar 

  16. Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Security and Privacy 5(2), 40–45 (2007)

    Article  Google Scholar 

  17. Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: Fast, generic, and safe unpacking of malware. In: Proc. Annual Computer Security Applications Conf., ACSAC, pp. 431–441 (2007)

    Google Scholar 

  18. Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Japkowicz, N., Elovici, Y.: Unknown malcode detection and the imbalance problem. J. Computer Virology 5(4), 295–308 (2009)

    Article  Google Scholar 

  19. Neugschwandtner, M., Comparetti, P.M., Jacob, G., Kruegel, C.: FORECAST – Skimming off the malware cream. In: Proc. Annual Computer Security Applications Conf., ACSAC (2011)

    Google Scholar 

  20. Perdisci, R., Lanzi, A., Lee, W.: Classification of packed executables for accurate computer virus detection. Pattern Recognition Letters 29(14), 1941–1946 (2008)

    Article  Google Scholar 

  21. Krishna Sandeep Reddy, D., Dash, S.K., Pujari, A.K.: New Malicious Code Detection Using Variable Length n-grams. In: Bagchi, A., Atluri, V. (eds.) ICISS 2006. LNCS, vol. 4332, pp. 276–288. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  22. Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: Polyunpack: Automating the hidden-code extraction of unpack-executing malware. In: Annual Computer Security Applications Conference (2006)

    Google Scholar 

  23. Rukhin, A., Soto, J., Nechvatal, J., Smid, M., Barker, E., Leigh, S., Levenson, M., Vangel, M., Banks, D., Heckert, A., Dray, J., Vo, S.: A statistical test suite for random and pseudorandom number generators for cryptographic applications. Technical Report 800-22, NIST (2001)

    Google Scholar 

  24. Sun, L., Versteeg, S., Boztaş, S., Yann, T.: Pattern Recognition Techniques for the Classification of Malware Packers. In: Steinfeld, R., Hawkes, P. (eds.) ACISP 2010. LNCS, vol. 6168, pp. 370–390. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  25. Tabish, S.M., Shafiq, M.Z., Farooq, M.: Malware detection using statistical analysis of byte-level file content. In: Proc. ACM SIGKDD Workshop CyberSecurity and Intelligence Informatics (2009)

    Google Scholar 

  26. Walenstein, A., Venable, M., Hayes, M., Thompson, C., Lakhotia, A.: Exploiting similarity between variants to defeat malware. In: Proc. BlackHat DC Conf. (2007)

    Google Scholar 

  27. Wicherski, G.: peHash: A novel approach to fast malware clustering. In: USENIX Workshop Large-Scale Exploits and Emergent Threats, LEET (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jacob, G., Comparetti, P.M., Neugschwandtner, M., Kruegel, C., Vigna, G. (2013). A Static, Packer-Agnostic Filter to Detect Similar Malware Samples. In: Flegel, U., Markatos, E., Robertson, W. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2012. Lecture Notes in Computer Science, vol 7591. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37300-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37300-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37299-5

  • Online ISBN: 978-3-642-37300-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics