Structural entropy and metamorphic malware

  • Donabelle Baysa
  • Richard M. Low
  • Mark StampEmail author
Original Paper


Metamorphic malware is capable of changing its internal structure without altering its functionality. A common signature is nonexistent in highly metamorphic malware and, consequently, such malware can remain undetected under standard signature scanning. In this paper, we apply previous work on structural entropy to the metamorphic detection problem. This technique relies on an analysis of variations in the complexity of data within a file. The process consists of two stages, namely, file segmentation and sequence comparison. In the segmentation stage, we use entropy measurements and wavelet analysis to segment files. The second stage measures the similarity of file pairs by computing an edit distance between the sequences of segments obtained in the first stage. We apply this similarity measure to the metamorphic detection problem and show that we obtain strong results in certain challenging cases.


Edit Distance Levenshtein Distance Executable File Structural Entropy Normalize Compression Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Addison, P.: The Illustrated Wavelet Transform Handbook: Introductory Theory and Applications in Science. Engineering, Medicine and Finance. Taylor and Francis Group, New York (2002)Google Scholar
  2. 2.
    Apostolico, A., Galil, Z.: Pattern Matching Algorithms. Oxford University Press, Oxford (1997)CrossRefzbMATHGoogle Scholar
  3. 3.
    Attaluri, S., McGhee, S., Stamp, M.: Profile hidden Markov models and metamorphic virus detection. J. Comput. Virol. 5(2), 151–169 (2009)CrossRefGoogle Scholar
  4. 4.
    Aycock, J.: Computer Viruses and Malware. Springer, New York (2006)Google Scholar
  5. 5.
    Baysa, D.: Structural entropy and metamorphic malware. Master’s report, Department of Computer Science, San Jose State University. (2012)
  6. 6.
    Borda, M.: Fundamentals in Information Theory and Coding. Springer, New York (2011)CrossRefzbMATHGoogle Scholar
  7. 7.
    Borello, J., Me, L.: Code obfuscation techniques for metamorphic viruses. J. Comput. Virol. 4(3), 30–40 (2008)CrossRefGoogle Scholar
  8. 8.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997)CrossRefGoogle Scholar
  9. 9.
    Burford, S.: Reverse engineering Linux ELF binaries on the x86 platform. (2002)
  10. 10.
    Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inform. Theory 51(4), 1523–1545 (2005)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Collberg, C., Thomborson, C., Low, C.: A taxonomy of obfuscating transformations. Technical Report #118. The University of Auckland (1997)Google Scholar
  12. 12.
    Cygwin, Cygwin utility files. Accessed Dec 2012
  13. 13.
    Islita, M.: Levenshtein edit distance. (2006)
  14. 14.
    Karmeshu.: Entropy Measures, Maximum Entropy Principle and Emerging Applications. Springer, New York (2003)Google Scholar
  15. 15.
    The Mental Driller, Metamorphism in practice or “How I made MetaPHOR and what I’ve learnt”. (2002)
  16. 16.
    Patel, M.: Similarity tests for metamorphic virus detection, Master’s report. Department of Computer Science, San Jose State University. (2011)
  17. 17.
    Pietrek, M.: Peering inside the PE: a tour of the Win32 portable executable file format. MSDN Magazine. (1994)
  18. 18.
    Radhakrishnan, D.: Approximate disassembly, Master’s report. Department of Computer Science, San Jose State University. (2010)
  19. 19.
    Robinson, S.: Expert. NET 1.1 Programming. Apress, New York (2004)Google Scholar
  20. 20.
    Runwal, N., Low, R., Stamp, M.: Opcode graph similarity and metamorphic detection. J. Comput Virol. 8(1–2), 37–52 (2012)CrossRefGoogle Scholar
  21. 21.
    SearchSecurity, Metamorphic and polymorphic malw- are. (2010)
  22. 22.
    Shah, A.: Approximate disassembly using dynamic programming, Master’s report. Department of Computer Science, San Jose State University. (2010)
  23. 23.
    Shanmugam, G., Low, R., Stamp, M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. (to appear)Google Scholar
  24. 24.
    Snakebyte, Next Generation Virus Construction Kit (NGVCK). Open Malware (2000)
  25. 25.
    Sorokin, I.: Comparing files using structural entropy. J. Comput. Virol. 7(4), 259–265 (2011)CrossRefMathSciNetGoogle Scholar
  26. 26.
    Sridhara, S.M., Stamp, M.: Metamorphic worm that carries its own morphing engine. J. Comput. Virol. (2012) (online \(\text{ first }^{\rm TM}\))Google Scholar
  27. 27.
    Stamp, M.: A revealing introduction to hidden Markov models. (2012)
  28. 28.
    Struzik, Z., Siebes, A.: The Haar wavelet transform in the time series similarity paradigm. In: Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery (PKDD ’99). Springer, London. (1999)
  29. 29.
    Symantec, Viruses, worms, and trojans. (2011)
  30. 30.
    Van Fleet, P.: The discrete haar wavelet transformation. Joint Mathematical Meetings, Center for Applied Mathematics, University of St. Thomas. (2007)
  31. 31.
    Verschuuren, G.: Excel 2007 for Scientists and Engineers. Holy Macro! Books (2008)Google Scholar
  32. 32.
    Virus files, Department of Computer Science, San Jose State University. (2012)
  33. 33.
    Vuorenmaa, T.: The discrete wavelet transform with financial time series applications. Seminar on Learning Systems, University of Helsinki. (2003)
  34. 34.
    Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM (JACM) 21(1), 168–173 (1974)CrossRefzbMATHMathSciNetGoogle Scholar
  35. 35.
    Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)CrossRefGoogle Scholar
  36. 36.
    You, I., Yim, K.: Malware obfuscation techniques: a brief survey. In: International Conference on Broadband, Wireless Computing. Communication and Applications (BWCCA), pp. 297–300 (2010)Google Scholar

Copyright information

© Springer-Verlag France 2013

Authors and Affiliations

  1. 1.Department of Computer ScienceSan Jose State UniversitySan JoseUSA
  2. 2.Department of MathematicsSan Jose State UniversitySan JoseUSA

Personalised recommendations