Mining Malware Specifications through Static Reachability Analysis

  • Hugo Daniel Macedo
  • Tayssir Touili
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8134)


The number of malicious software (malware) is growing out of control. Syntactic signature based detection cannot cope with such growth and manual construction of malware signature databases needs to be replaced by computer learning based approaches. Currently, a single modern signature capturing the semantics of a malicious behavior can be used to replace an arbitrarily large number of old-fashioned syntactical signatures. However teaching computers to learn such behaviors is a challenge. Existing work relies on dynamic analysis to extract malicious behaviors, but such technique does not guarantee the coverage of all behaviors. To sidestep this limitation we show how to learn malware signatures using static reachability analysis. The idea is to model binary programs using pushdown systems (that can be used to model the stack operations occurring during the binary code execution), use reachability analysis to extract behaviors in the form of trees, and use subtrees that are common among the trees extracted from a training set of malware files as signatures. To detect malware we propose to use a tree automaton to compactly store malicious behavior trees and check if any of the subtrees extracted from the file under analysis is malicious. Experimental data shows that our approach can be used to learn signatures from a training set of malware files and use them to detect a test set of malware that is 10 times the size of the training set.


Model Check Application Program Interface System Call Reachability Analysis Tree Automaton 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adleman, L.M.: An abstract theory of computer viruses. In: Goldwasser, S. (ed.) CRYPTO 1988. LNCS, vol. 403, pp. 354–374. Springer, Heidelberg (1990)CrossRefGoogle Scholar
  2. 2.
    Babić, D., Reynaud, D., Song, D.: Malware analysis with tree automata inference. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 116–131. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  3. 3.
    Bergeron, J., Debbabi, M., Erhioui, M.M., Ktari, B.: Static analysis of binary code to isolate malicious behaviors. In: WETICE, pp. 184–189. IEEE Computer Society (1999)Google Scholar
  4. 4.
    Bonfante, G., Kaczmarek, M., Marion, J.-Y.: Toward an Abstract Computer Virology (2005)Google Scholar
  5. 5.
    Bonfante, G., Kaczmarek, M., Marion, J.-Y.: On Abstract Computer Virology from a Recursion Theoretic Perspective. Journal in Computer Virology 1, 45–54 (2006)CrossRefGoogle Scholar
  6. 6.
    Bonfante, G., Kaczmarek, M., Marion, J.-Y.: A Classification of Viruses Through Recursion Theorems (2007)Google Scholar
  7. 7.
    Bonfante, G., Kaczmarek, M., Marion, J.-Y.: Architecture of a morphological malware detector. Journal in Computer Virology 5, 263–270 (2009)CrossRefGoogle Scholar
  8. 8.
    Bouajjani, A., Esparza, J., Maler, O.: Reachability analysis of pushdown automata: Application to model-checking. In: Mazurkiewicz, A., Winkowski, J. (eds.) CONCUR 1997. LNCS, vol. 1243, pp. 135–150. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  9. 9.
    Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th Conf. on USENIX Security Symposium (2003)Google Scholar
  10. 10.
    Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of the 1st India Software Engineering Conference, ISEC 2008, pp. 5–14 (2008)Google Scholar
  11. 11.
    Christodorescu, M., Jha, S., Seshia, S.A., Song, D.X., Bryant, R.E.: Semantics-aware malware detection. In: IEEE Symposium on Security and Privacy, pp. 32–46 (2005)Google Scholar
  12. 12.
    Esparza, J., Hansel, D., Rossmanith, P., Schwoon, S.: Efficient algorithms for model checking pushdown systems. In: Emerson, E.A., Sistla, A.P. (eds.) CAV 2000. LNCS, vol. 1855, pp. 232–247. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    Fossi, M., Egan, G., Haley, K., Johnson, E., Mack, T., Adams, T., Blackbird, J., Low, M., Mazurek, D., McKinney, D., et al.: Symantec internet security threat report trends for 2010Google Scholar
  14. 14.
    Fredrikson, M., Jha, S., Christodorescu, M., Sailer, R., Yan, X.: Synthesizing near-optimal malware specifications from suspicious behaviors. IEEE S. Security and Privacy (2010)Google Scholar
  15. 15.
    Hex-Rays, S.: Ida pro (2011)Google Scholar
  16. 16.
    Holzer, A., Kinder, J., Veith, H.: Using verification technology to specify and detect malware. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds.) EUROCAST 2007. LNCS, vol. 4739, pp. 497–504. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  17. 17.
    Kinder, J., Katzenbeisser, S., Schallhart, C., Veith, H.: Detecting malicious code by model checking. In: Julisch, K., Kruegel, C. (eds.) DIMVA 2005. LNCS, vol. 3548, pp. 174–187. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  18. 18.
    Kinder, J., Katzenbeisser, S., Schallhart, C., Veith, H.: Proactive Detection of Computer Worms Using Model Checking. IEEE Trans. on Dependable and Secure Computing (2010)Google Scholar
  19. 19.
    Kinder, J., Veith, H.: Jakstab: A static analysis platform for binaries. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 423–427. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  20. 20.
    Kramer, S., Bradfield, J.C.: A general definition of malware. Journal in Computer Virology 6(2), 105–114 (2010)CrossRefGoogle Scholar
  21. 21.
    McAfee. McAfee threats report: Third quarter 2012. Technical report, McAfee (2012)Google Scholar
  22. 22.
    Singh, P., Lakhotia, A.: Static verification of worm and virus behavior in binary executables using model checking. In: Information Assurance Workshop, pp. 298–300 (2003)Google Scholar
  23. 23.
    Skaletsky, A., Devor, T., Chachmon, N., Cohn, R.S., Hazelwood, K.M., Vladimirov, V., Bach, M.: Dynamic program analysis of Microsoft Windows applications. In: ISPASS (2010)Google Scholar
  24. 24.
    Song, F., Touili, T.: Efficient malware detection using model-checking. In: Giannakopoulou, D., Méry, D. (eds.) FM 2012. LNCS, vol. 7436, pp. 418–433. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  25. 25.
    Song, F., Touili, T.: Pushdown model checking for malware detection. In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 110–125. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  26. 26.
    Song, F., Touili, T.: LTL model-checking for malware detection. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 416–431. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  27. 27.
    Szor, P.: The Art of Computer Virus Research and Defense. Addison-Wesley Pro. (2005)Google Scholar
  28. 28.
    Tahan, G., Rokach, L., Shahar, Y.: Mal-id: Automatic malware detection using common segment analysis and meta-features. Journal of Machine Learning Research 1, 1–48 (2012)Google Scholar
  29. 29.
    Wörlein, M., Meinl, T., Fischer, I., Philippsen, M.: A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 392–403. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  30. 30.
    Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: ICDM (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Hugo Daniel Macedo
    • 1
  • Tayssir Touili
    • 1
  1. 1.LIAFA, CNRS and Univ. Paris DiderotFrance

Personalised recommendations