Automatic Generation of String Signatures for Malware Detection

  • Kent Griffin
  • Scott Schneider
  • Xin Hu
  • Tzi-cker Chiueh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5758)


Scanning files for signatures is a proven technology, but exponential growth in unique malware programs has caused an explosion in signature database sizes. One solution to this problem is to use string signatures, each of which is a contiguous byte sequence that potentially can match many variants of a malware family. However, it is not clear how to automatically generate these string signatures with a sufficiently low false positive rate. Hancock is the first string signature generation system that takes on this challenge on a large scale.

To minimize the false positive rate, Hancock features a scalable model that estimates the occurrence probability of arbitrary byte sequences in goodware programs, a set of library code identification techniques, and diversity-based heuristics that ensure the contexts in which a signature is embedded in containing malware files are similar to one another. With these techniques combined, Hancock is able to automatically generate string signatures with a false positive rate below 0.1%.


malware signatures signature generation Markov model library function identification diversity-based heuristics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Clam AntiVirus: Creating signatures for ClamAV (2007),
  3. 3.
    Arnold, W., Tesauro, G.: Automatically generated win32 heuristic virus detection. In: Proceedings of Virus Bulletin Conference (2000)Google Scholar
  4. 4.
    Jacob, G., Debar, H., Filiol, E.: Behavioral detection of malware: from a survey towards an established taxonomy. Journal in Computer Virology 4(3) (2008)Google Scholar
  5. 5.
    Singh, S., Estan, C., Varghese, G., Savage, S.: Automated worm fingerprinting. In: OSDI 2004: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, Berkeley, CA, USA, p. 4. USENIX Association (2004)Google Scholar
  6. 6.
    Kim, H.: Autograph: Toward automated, distributed worm signature detection. In: Proceedings of the 13th Usenix Security Symposium, pp. 271–286 (2004)Google Scholar
  7. 7.
    Kreibich, C., Crowcroft, J.: Honeycomb: creating intrusion detection signatures using honeypots. SIGCOMM Comput. Commun. Rev. 34(1), 51–56 (2004)CrossRefGoogle Scholar
  8. 8.
    Newsome, J., Karp, B., Song, D.: Polygraph: Automatically generating signatures for polymorphic worms. In: SP 2005: Proceedings of the 2005 IEEE Symposium on Security and Privacy, Washington, DC, USA, pp. 226–241. IEEE Computer Society, Los Alamitos (2005)Google Scholar
  9. 9.
    Li, Z., Sanghi, M., Chen, Y., Kao, M., Chavez, B.: Hamsa: fast signature generation for zero-day polymorphic worms with provable attack resilience. In: SP 2006: Proceedings of the 2006 IEEE Symposium on Security and Privacy, Oakland06, pp. 32–47. IEEE Computer Society, Los Alamitos (2006)CrossRefGoogle Scholar
  10. 10.
    Tang, Y., Chen, S.: Defending against internet worms: A signature-based approach. In: Proceedings of IEEE INFOCOM 2005 (2005)Google Scholar
  11. 11.
    Christodorescu, M., Jha, S., Seshia, S., Song, D., Bryant, R.: Semantics-aware malware detection. In: Proceedings of the IEEE Symposium on Security and Privacy (2005)Google Scholar
  12. 12.
    Yegneswaran, V., Giffin, J.T., Barford, P., Jha, S.: An architecture for generating semantics-aware signatures. In: SSYM 2005: Proceedings of the 14th conference on USENIX Security Symposium, Berkeley, CA, USA, p. 7. USENIX Association (2005)Google Scholar
  13. 13.
    Kephart, J.O., Arnold, W.C.: Automatic extraction of computer virus signatures. In: Proceedings of the 4th Virus Bulletin International Conference (1994)Google Scholar
  14. 14.
    Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order markov models. Journal of Artificial Intelligence Research 22, 384–421 (2004)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Guilfanov, I.: Fast library identification and recognition technology (1997),
  16. 16.
    Guo, F., Ferrie, P., Chiueh, T.: A study of the packer problem and its solutions. In: Lippmann, R., Kirda, E., Trachtenberg, A. (eds.) RAID 2008. LNCS, vol. 5230, pp. 98–115. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Griffin, K., Schneider, S., Hu, X., Chiueh, T.: Automatic generation of string signatures for malware detection (2009),

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Kent Griffin
    • 1
  • Scott Schneider
    • 1
  • Xin Hu
    • 1
  • Tzi-cker Chiueh
    • 1
  1. 1.Symantec Research LaboratoriesUSA

Personalised recommendations