DepSim: A Dependency-Based Malware Similarity Comparison System

  • Yang Yi
  • Ying Lingyun
  • Wang Rui
  • Su Purui
  • Feng Dengguo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6584)


It is important for malware analysis that comparing unknown files to previously-known malicious samples to quickly characterize the type of behavior and generate signatures. Malware writers often use obfuscation, such as packing, junk-insertion and other means of techniques to thwart traditional similarity comparison methods. In this paper, we introduce DepSim, a novel technique for finding dependency similarities between malicious binary programs. DepSim constructs dependency graphs of control flow and data flow of the program by taint analysis, and then conducts similarity analysis using a new graph isomorphism technique. In order to promote the accuracy and anti-interference capability, we reduce redundant loops and remove junk actions at the dependency graph pre-processing phase, which can also greatly improve the performance of our comparison algorithm. We implemented a prototype of DepSim and evaluated it to malware in the wild. Our prototype system successfully identified some semantic similarities between malware and revealed their inner similarity in program logic and behavior. The results demonstrate that our technique is accurate.


Malware Analysis Similarity Analysis Dynamic Taint Analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gao, D., Reiter, M., Song, D.: Binhunt: Automatically Finding Semantic Differences in Binary Programs. In: Proceedings of the International Conference on Information and Communications Security, pp. 238–255 (2008)Google Scholar
  2. 2.
    Wang, Z., Pierce, K., McFarling, S.: BMAT – a binary matching tool for stale profile propagation. The Journal of Instruction-Level Parallelism 2 (May 2000)Google Scholar
  3. 3.
  4. 4.
    Bayer, U., Kruegel, C., Kirda, E.: TTAnalyze: A Tool for Analyzing Malware. In: Proc. of the 15th European Institute for Computer Antivirus Research Annual Conference (April 2006)Google Scholar
  5. 5.
    Yin, H., Song, D., Egele, M., Kruegel, C., Kirda, E.: Panorama: capturing system-wide information flow for malware detection and analysis. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, Virginia, USA, Alexandria, October 28-31 (2007)Google Scholar
  6. 6.
    Bellard, F.: QEMU, a fast and portable dynamic translator. In: In Proc. of the USENIX Annual Technical Conference, pp. 41–46 (April 2005)Google Scholar
  7. 7.
    Dullien, T., Rolles, R.: Graph-based comparison of executable objects. In: Proceedings of SSTIC 2005 (2005)Google Scholar
  8. 8.
    Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, August 22-25 (2004)Google Scholar
  9. 9.
    Bilar, D.: Statistical Structures: Tolerant Fingerprinting for Classification and Analysis given at BH 2006, Las Vegas, NV. Blackhat Briefings, USA (August 2006)Google Scholar
  10. 10.
    Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
  12. 12.
    Bayer, U., Comparetti, P.M., Hlauscheck, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Network and Distributed System Security Symposium, NDSS (2009)Google Scholar
  13. 13.
    Baker, B.S., Manber, U.: Deducing Similarities in Java Sources from Bytecodes, pp. 179–190 (1998)Google Scholar
  14. 14.
    Sreedhar, V.C., Gao, G.R., Lee, Y.-F.: Identifying loops using DJ graphs. ACM Transactions on Programming Languages and Systems (TOPLAS) 18(6), 649–658 (1996)CrossRefGoogle Scholar
  15. 15.
    Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Dubrovnik, Croatia, September 03-07 (2007)Google Scholar
  16. 16.
  17. 17.
    Lee, W., Stolfo, S.: Data mining approaches for intrusion detection. In: Proceedings of the 7th USENIX Security Symposium (1998)Google Scholar
  18. 18.
  19. 19.
    Christodorescu, M., Kinder, J., Jha, S., Katzenbeisser, S., Veith, H.: Malware normalization. Technical Report 1539, University of Wisconsin, Madison, Wisconsin, USA (November 2005)Google Scholar
  20. 20.
    Walenstein, A., Venable, M., Hayes, M., Thompson, C., Lakhotia, A.: Exploiting similarity between variants to defeat malware: “Vilo” method for comparing and searching binary programs. In: Proceedings of BlackHat, DC 2007 (2007)Google Scholar
  21. 21.
    Newsome, J., Song, D.: Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In: Proceedings of NDSS 2005, San Diego, California, USA (February 2005)Google Scholar
  22. 22.
    Willems, C., Holz, T., Freiling, F.: CWSandbox: Towards automated dynamic binary analysis. IEEE Security and Privacy 5(2) (2007)Google Scholar
  23. 23.
    Anubis: Analyzing Unknown Binaries,
  24. 24.
  25. 25.
  26. 26.
    Lee, W., Stolfo, S.: Data mining approaches for intrusion detection. In: Proceedings of the 7th USENIX Security Symposium (1998)Google Scholar
  27. 27.
    Jordan, M.: Dealing with metamorphism. Virus Bulletin, 4–6 (October 2002)Google Scholar
  28. 28.
    Zhuge, J., Holz, T., Han, X., Guo, J., Zou, W.: Characterizing the IRC-based Botnet Phenomenon, Reihe Informatik Technical Report TR-2007-010 (December 2007)Google Scholar
  29. 29.
    Lingyun, Y., Purui, S., Dengguo, F., Xianggen, W., Yi, Y., Yu, L.: ReconBin: Reconstructing Binary File from Execution for Software Analysis. In: Proceedings of the 2009 Third IEEE International Conference on Secure Software Integration and Reliability Improvement, pp. 222–229 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yang Yi
    • 1
    • 2
  • Ying Lingyun
    • 1
    • 3
  • Wang Rui
    • 2
  • Su Purui
    • 1
  • Feng Dengguo
    • 1
    • 2
  1. 1.State Key Laboratory of Information SecurityInstitute of Software, Chinese Academy of SciencesBeijingChina
  2. 2.State Key Laboratory of Information SecurityGraduate University of Chinese Academy of SciencesBeijingChina
  3. 3.National Engineering Research Center for Information SecurityBeijingChina

Personalised recommendations