A Static, Packer-Agnostic Filter to Detect Similar Malware Samples

Jacob, Grégoire; Comparetti, Paolo Milani; Neugschwandtner, Matthias; Kruegel, Christopher; Vigna, Giovanni

doi:10.1007/978-3-642-37300-8_6

Grégoire Jacob^19,21,22,
Paolo Milani Comparetti^20,22,
Matthias Neugschwandtner²⁰,
Christopher Kruegel^19,22 &
…
Giovanni Vigna^19,22

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 7591))

Included in the following conference series:

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

2174 Accesses
23 Citations

Abstract

The steadily increasing number of malware variants is a significant problem, clogging the input queues of automated analysis tools. The generation of malware variants is made easy by automatic packers and polymorphic engines, which produce by encryption and compression a multitude of distinct versions. A great deal of time and resources could be saved by prioritizing samples to analyze, either, to avoid the repeated analyses of variants and focus on innovative malware, or, on the contrary, to re-analyze variants and have better insights on their evolution. Unfortunately, indexing in malware analysis tools and repositories relies on executable digests (hashes) that strongly differ for each variant.

In this paper, we present a robust filter to quickly determine when a malware program is similar to a previously-seen sample. Compared to previous work, our similarity measure does not require the costly task of preliminary unpacking, but instead, operates directly on packed code. Our approach exploits the fact that current packers use compression and weak encryption schemes that do not break, in the packed versions, all the similarities existing between the original versions of two programs. In addition, we introduce a packer detection technique that is able to distinguish between different levels of protection, such as unpacked, compressed, encrypted, and multi-layer encrypted code. This allows us to optimize the sensitivity of the similarity measure accordingly. We evaluated our approach on a large malware repository containing 795,000 samples. Our results show that the similarity measure is highly effective in filtering out malware variants, even after re-packing, and can reduce the number of samples that need to be analyzed by a factor of 3 to 5.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ANUBIS, http://anubis.iseclab.org
CWSandbox, http://www.mwanalysis.org
Norman Sandbox, http://www.norman.com/technology/norman_sandbox/
ThreatExpert, http://www.threatexpert.com
VirusTotal, http://www.virustotal.com
Bayer, U., Comparetti, P., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Proc. Symp. Network and Distributed System Security, NDSS (2009)
Google Scholar
Carrera, E., Erdelyi, G.: Digital genome mapping. In: Virus Bulletin (2004)
Google Scholar
Ebringer, T., Sun, L., Boztas, S.: A fast randomness test that preserves local detail. In: Virus Bulletin (2008)
Google Scholar
Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: USENIX Security Symposium (2006)
Google Scholar
Gheorghescu, M.: An automated virus classification system. In: Virus Bulletin (2005)
Google Scholar
Hu, X., Chiueh, T., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proc. ACM Conf. Computer and Communications Security, CCS, pp. 611–620. ACM (2009)
Google Scholar
Kang, M.G., Poosankam, P., Yin, H.: Renovo: a hidden code extractor for packed executables. In: Proc. ACM Workshop Recurring Malcode, WORM, pp. 46–53. ACM (2007)
Google Scholar
Karnik, A., Goswami, S., Guha, R.: Detecting obfuscated viruses using cosine similarity analysis. In: Proc. Asia Int. Conf. Modelling & Simulation, AMS, pp. 165–170. IEEE Computer Society (2007)
Google Scholar
Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic Worm Detection Using Structural Information of Executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006)
Chapter Google Scholar
Kruegel, C., Vigna, G.: Anomaly detection of web-based attacks. In: Proc. ACM Conf. Computer and Communications Security, CCS. ACM (2003)
Google Scholar
Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Security and Privacy 5(2), 40–45 (2007)
Article Google Scholar
Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: Fast, generic, and safe unpacking of malware. In: Proc. Annual Computer Security Applications Conf., ACSAC, pp. 431–441 (2007)
Google Scholar
Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Japkowicz, N., Elovici, Y.: Unknown malcode detection and the imbalance problem. J. Computer Virology 5(4), 295–308 (2009)
Article Google Scholar
Neugschwandtner, M., Comparetti, P.M., Jacob, G., Kruegel, C.: FORECAST – Skimming off the malware cream. In: Proc. Annual Computer Security Applications Conf., ACSAC (2011)
Google Scholar
Perdisci, R., Lanzi, A., Lee, W.: Classification of packed executables for accurate computer virus detection. Pattern Recognition Letters 29(14), 1941–1946 (2008)
Article Google Scholar
Krishna Sandeep Reddy, D., Dash, S.K., Pujari, A.K.: New Malicious Code Detection Using Variable Length n-grams. In: Bagchi, A., Atluri, V. (eds.) ICISS 2006. LNCS, vol. 4332, pp. 276–288. Springer, Heidelberg (2006)
Chapter Google Scholar
Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: Polyunpack: Automating the hidden-code extraction of unpack-executing malware. In: Annual Computer Security Applications Conference (2006)
Google Scholar
Rukhin, A., Soto, J., Nechvatal, J., Smid, M., Barker, E., Leigh, S., Levenson, M., Vangel, M., Banks, D., Heckert, A., Dray, J., Vo, S.: A statistical test suite for random and pseudorandom number generators for cryptographic applications. Technical Report 800-22, NIST (2001)
Google Scholar
Sun, L., Versteeg, S., Boztaş, S., Yann, T.: Pattern Recognition Techniques for the Classification of Malware Packers. In: Steinfeld, R., Hawkes, P. (eds.) ACISP 2010. LNCS, vol. 6168, pp. 370–390. Springer, Heidelberg (2010)
Chapter Google Scholar
Tabish, S.M., Shafiq, M.Z., Farooq, M.: Malware detection using statistical analysis of byte-level file content. In: Proc. ACM SIGKDD Workshop CyberSecurity and Intelligence Informatics (2009)
Google Scholar
Walenstein, A., Venable, M., Hayes, M., Thompson, C., Lakhotia, A.: Exploiting similarity between variants to defeat malware. In: Proc. BlackHat DC Conf. (2007)
Google Scholar
Wicherski, G.: peHash: A novel approach to fast malware clustering. In: USENIX Workshop Large-Scale Exploits and Emergent Threats, LEET (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Santa Barbara, USA
Grégoire Jacob, Christopher Kruegel & Giovanni Vigna
Vienna University of Technology, Austria
Paolo Milani Comparetti & Matthias Neugschwandtner
Télécom SudParis, France
Grégoire Jacob
LastLine, Inc., USA
Grégoire Jacob, Paolo Milani Comparetti, Christopher Kruegel & Giovanni Vigna

Authors

Grégoire Jacob
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Milani Comparetti
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Neugschwandtner
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Kruegel
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Vigna
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department C, HFT Stuttgart, Schellingstr. 24, 70174, Stuttgart, Germany
Ulrich Flegel
Department of Computer Science, Foundation for Research and Technology – Hellas (FORTH), 100 Plastira Ave, Vassilika Vouton, 70013, Heraklion, Crete, Greece
Evangelos Markatos
College of Computer and Information Science, Northeastern University, 360 Huntington Ave, 02115, Boston, MA, USA
William Robertson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jacob, G., Comparetti, P.M., Neugschwandtner, M., Kruegel, C., Vigna, G. (2013). A Static, Packer-Agnostic Filter to Detect Similar Malware Samples. In: Flegel, U., Markatos, E., Robertson, W. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2012. Lecture Notes in Computer Science, vol 7591. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37300-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-37300-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37299-5
Online ISBN: 978-3-642-37300-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics