Abstract
The steadily increasing number of malware variants is a significant problem, clogging the input queues of automated analysis tools. The generation of malware variants is made easy by automatic packers and polymorphic engines, which produce by encryption and compression a multitude of distinct versions. A great deal of time and resources could be saved by prioritizing samples to analyze, either, to avoid the repeated analyses of variants and focus on innovative malware, or, on the contrary, to re-analyze variants and have better insights on their evolution. Unfortunately, indexing in malware analysis tools and repositories relies on executable digests (hashes) that strongly differ for each variant.
In this paper, we present a robust filter to quickly determine when a malware program is similar to a previously-seen sample. Compared to previous work, our similarity measure does not require the costly task of preliminary unpacking, but instead, operates directly on packed code. Our approach exploits the fact that current packers use compression and weak encryption schemes that do not break, in the packed versions, all the similarities existing between the original versions of two programs. In addition, we introduce a packer detection technique that is able to distinguish between different levels of protection, such as unpacked, compressed, encrypted, and multi-layer encrypted code. This allows us to optimize the sensitivity of the similarity measure accordingly. We evaluated our approach on a large malware repository containing 795,000 samples. Our results show that the similarity measure is highly effective in filtering out malware variants, even after re-packing, and can reduce the number of samples that need to be analyzed by a factor of 3 to 5.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ANUBIS, http://anubis.iseclab.org
CWSandbox, http://www.mwanalysis.org
Norman Sandbox, http://www.norman.com/technology/norman_sandbox/
ThreatExpert, http://www.threatexpert.com
VirusTotal, http://www.virustotal.com
Bayer, U., Comparetti, P., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Proc. Symp. Network and Distributed System Security, NDSS (2009)
Carrera, E., Erdelyi, G.: Digital genome mapping. In: Virus Bulletin (2004)
Ebringer, T., Sun, L., Boztas, S.: A fast randomness test that preserves local detail. In: Virus Bulletin (2008)
Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: USENIX Security Symposium (2006)
Gheorghescu, M.: An automated virus classification system. In: Virus Bulletin (2005)
Hu, X., Chiueh, T., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proc. ACM Conf. Computer and Communications Security, CCS, pp. 611–620. ACM (2009)
Kang, M.G., Poosankam, P., Yin, H.: Renovo: a hidden code extractor for packed executables. In: Proc. ACM Workshop Recurring Malcode, WORM, pp. 46–53. ACM (2007)
Karnik, A., Goswami, S., Guha, R.: Detecting obfuscated viruses using cosine similarity analysis. In: Proc. Asia Int. Conf. Modelling & Simulation, AMS, pp. 165–170. IEEE Computer Society (2007)
Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic Worm Detection Using Structural Information of Executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006)
Kruegel, C., Vigna, G.: Anomaly detection of web-based attacks. In: Proc. ACM Conf. Computer and Communications Security, CCS. ACM (2003)
Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Security and Privacy 5(2), 40–45 (2007)
Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: Fast, generic, and safe unpacking of malware. In: Proc. Annual Computer Security Applications Conf., ACSAC, pp. 431–441 (2007)
Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Japkowicz, N., Elovici, Y.: Unknown malcode detection and the imbalance problem. J. Computer Virology 5(4), 295–308 (2009)
Neugschwandtner, M., Comparetti, P.M., Jacob, G., Kruegel, C.: FORECAST – Skimming off the malware cream. In: Proc. Annual Computer Security Applications Conf., ACSAC (2011)
Perdisci, R., Lanzi, A., Lee, W.: Classification of packed executables for accurate computer virus detection. Pattern Recognition Letters 29(14), 1941–1946 (2008)
Krishna Sandeep Reddy, D., Dash, S.K., Pujari, A.K.: New Malicious Code Detection Using Variable Length n-grams. In: Bagchi, A., Atluri, V. (eds.) ICISS 2006. LNCS, vol. 4332, pp. 276–288. Springer, Heidelberg (2006)
Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: Polyunpack: Automating the hidden-code extraction of unpack-executing malware. In: Annual Computer Security Applications Conference (2006)
Rukhin, A., Soto, J., Nechvatal, J., Smid, M., Barker, E., Leigh, S., Levenson, M., Vangel, M., Banks, D., Heckert, A., Dray, J., Vo, S.: A statistical test suite for random and pseudorandom number generators for cryptographic applications. Technical Report 800-22, NIST (2001)
Sun, L., Versteeg, S., Boztaş, S., Yann, T.: Pattern Recognition Techniques for the Classification of Malware Packers. In: Steinfeld, R., Hawkes, P. (eds.) ACISP 2010. LNCS, vol. 6168, pp. 370–390. Springer, Heidelberg (2010)
Tabish, S.M., Shafiq, M.Z., Farooq, M.: Malware detection using statistical analysis of byte-level file content. In: Proc. ACM SIGKDD Workshop CyberSecurity and Intelligence Informatics (2009)
Walenstein, A., Venable, M., Hayes, M., Thompson, C., Lakhotia, A.: Exploiting similarity between variants to defeat malware. In: Proc. BlackHat DC Conf. (2007)
Wicherski, G.: peHash: A novel approach to fast malware clustering. In: USENIX Workshop Large-Scale Exploits and Emergent Threats, LEET (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jacob, G., Comparetti, P.M., Neugschwandtner, M., Kruegel, C., Vigna, G. (2013). A Static, Packer-Agnostic Filter to Detect Similar Malware Samples. In: Flegel, U., Markatos, E., Robertson, W. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2012. Lecture Notes in Computer Science, vol 7591. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37300-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-37300-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37299-5
Online ISBN: 978-3-642-37300-8
eBook Packages: Computer ScienceComputer Science (R0)