Skip to main content
Log in

Entropy analysis to classify unknown packing algorithms for malware detection

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

The proportion of packed malware has been growing rapidly and now comprises more than 80 % of all existing malware. In this paper, we propose a method for classifying the packing algorithms of given unknown packed executables, regardless of whether they are malware or benign programs. First, we scale the entropy values of a given executable and convert the entropy values of a particular location of memory into symbolic representations. Our proposed method uses symbolic aggregate approximation (SAX), which is known to be effective for large data conversions. Second, we classify the distribution of symbols using supervised learning classification methods, i.e., naive Bayes and support vector machines for detecting packing algorithms. The results of our experiments involving a collection of 324 packed benign programs and 326 packed malware programs with 19 packing algorithms demonstrate that our method can identify packing algorithms of given executables with a high accuracy of 95.35 %, a recall of 95.83 %, and a precision of 94.13 %. We propose four similarity measurements for detecting packing algorithms based on SAX representations of the entropy values and an incremental aggregate analysis. Among these four metrics, the fidelity similarity measurement demonstrates the best matching result, i.e., a rate of accuracy ranging from 95.0 to 99.9 %, which is from 2 to 13  higher than that of the other three metrics. Our study confirms that packing algorithms can be identified through an entropy analysis based on a measure of the uncertainty of the running processes and without prior knowledge of the executables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Symantec Corporation.: Internet Security Threat Report (2014)

  2. Choi, H., Zhu, B.B., Lee, H.: Detecting Malicious Web Links and Identifying Their Attack Types. In: WebApps (2011)

  3. Yan, W., Zhang, Z., Ansari, N.: Revealing packed malware. IEEE Secur. Priv. 6(5), 65–69 (2008)

    Article  Google Scholar 

  4. Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Secur. Priv. 2, 40–45 (2007)

    Article  Google Scholar 

  5. Guo, F., Ferrie, P., Chiueh, T.C.: A study of the packer problem and its solutions. In: Recent Advances in Intrusion Detection, pp. 98–115. Springer, Berlin, Heidelberg, Cambridge (2008)

  6. Shafiq, M.Z., Tabish, S.M., Mirza, F., Farooq, M.: Pe-miner: Mining structural information to detect malicious executables in realtime. In: Recent advances in Intrusion Detection, pp. 121–141. (2009)

  7. Shafiq, M.Z., Tabish, S., Farooq, M.: PE-probe: leveraging packer detection and structural information to detect malicious portable executables. In: Proceedings of the Virus Bulletin Conference (VB), pp. 29–33. (2009)

  8. Saichand, G., Kumar, T.V., Tech, M.: Malwise-An Effective and Efficient Classification System for Packed and Polymorphic Malware, IEEE Transactions on Computer, pp. 1193–1206. (2013)

  9. Liu, L., Ming, J., Wang, Z., Gao, D., Jia, C.: Denial-of-service attacks on host-based generic unpackers. In: Information and Communications Security, pp. 241–253. (2009)

  10. GitHub.: PEID ser db 2 Yara Conversion. https://github.com/ocean1/peid2yara, (2014)

  11. Pasha, M.M.R., Prathima, M.Y., Thirupati, M.L., Malwise System for Packed and Polymorphic Malware, pp. 167–172. (2014)

  12. Briones, I., Gomez, A.: Graphs, entropy and grid computing: automatic comparison of malware. In: Virus Bulletin Conference, pp. 1–12. (2014)

  13. Sun, L., Versteeg, S., Bozta, S., Yann, T.: Pattern recognition techniques for the classification of malware packers. In: Information Security and Privacy, pp. 370–390. (2010)

  14. Adrian, M.: An Analysis of Simile. http://www.securityfocus.com/infocus/1671 (2003)

  15. Jacob, G., Comparetti, P.M., Neugschwandtner, M., Kruegel, C., Vigna, G.: A static, packer-agnostic filter to detect similar malware samples. In: Detection of intrusions and Malware, and vulnerability assessment, pp. 102–122. (2012)

  16. Perdisci, R., Lanzi, A., Lee, W.: Classification of packed executables for accurate computer virus detection. Pattern Recognit. Lett. 29(14), 1941–1946 (2008)

    Article  Google Scholar 

  17. Santos, I., Ugarte-Pedrero, X., Sanz, B., Laorden, C., Bringas, P.G.: Collective classification for packed executable identification. In: Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, pp. 23–30. ACM (2011)

  18. Cesare, S. and Xiang, Y.: Classification of malware using structured control flow. In: Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing-vol. 107, pp. 61–70. (2010)

  19. Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 470–478. ACM (2004)

  20. Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: IEEE Symposium on Security and Privacy, Proceedings, pp. 38–49. IEEE (2001)

  21. Stolfo, S.J., Wang, K., Li, W.J.: Towards stealthy malware detection. In: Malware Detection, pp. 231–249. Springer, US (2007)

  22. Tian, R., Batten, L., Islam, R., Versteeg, S.: An automated classification system based on the strings of trojan and virus families. In: MALWARE International Conference on, pp. 23–30. IEEE (2009)

  23. Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable. Behavior-Based Malware Clustering. In: NDSS 9, 8–11 (2009)

  24. Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of the 1st India software engineering conference, pp. 5–14. ACM (2008)

  25. Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X.Y., Wang, X.: Effective and efficient malware detection at the end host. In: USENIX Security Symposium, pp. 351–366. (2009)

  26. Szor, P.: The Art of Computer Virus Research and Defense. Pearson Education, New York (2005)

    Google Scholar 

  27. Lee, J., Jeong, K., Lee, H.: Detecting metamorphic malwares using code graphs. In: Proceedings of the ACM Symposium on Applied Computing, pp. 1970–1977. (2010)

  28. Vapnik, V.N., Chervonenkis, A.J.: Theory of pattern Recognition: Statistical Problems of Learning, Nauka (1974)

  29. Vapnik, V.: The Nature of Statistical Learning Theory. Springer Science & Business Media, New York (2013)

    MATH  Google Scholar 

  30. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)

    Article  Google Scholar 

  31. Jeong, G., Choo, E., Lee, J., Bat-Erdene, M., Lee, H.: Generic unpacking using entropy analysis. In: Malicious and Unwanted Software (MALWARE), pp. 98–105. IEEE (2010)

  32. Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: Fast, generic, and safe unpacking of malware. In: Computer Security Applications Conference, ACSAC, pp. 431–441. IEEE (2007)

  33. Kang, M.G., Poosankam, P., Yin, H.: Renovo: A hidden code extractor for packed executables. In: Proceedings of the ACM workshop on Recurring malcode, pp. 46–53. ACM (2007)

  34. Pietrek, M.: An In-depth Look into the Win32 Portable Executable File Format (2002)

  35. Yeung, R.W.: A First Course in Information Theory. Springer Science & Business Media, New York (2012)

    Google Scholar 

  36. Costa, M., Goldberger, A.L., Peng, C.K.: Multiscale entropy analysis of biological signals. Phys. Rev. E 71(2), 1–18 (2005)

    Article  MathSciNet  Google Scholar 

  37. Costa, M., Healey, J.A.: Multiscale entropy analysis of complex heart rate dynamics: discrimination of age and heart failure effects. In: Computers in Cardiology, pp. 705–708. IEEE (2003)

  38. Costa, M., Goldberger, A.L., Peng, C.K.: Multiscale entropy analysis of complex physiologic time series. Phys. Rev. Lett. 89(6), 21–24 (2002)

    Article  Google Scholar 

  39. Nikulin, V.V., Brismar, T.: Comment on multiscale entropy analysis of complex physiologic time series. Phys. Rev. Lett. 92(8), 804–812 (2004)

  40. Pincus, S.M.: Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. 88(6), 2297–2301 (1991)

  41. Pincus, S.M.: Assessing serial irregularity and its implications for health. Ann. NY Acad. Sci. 954(1), 245–267 (2001)

    Article  Google Scholar 

  42. Richman, J.S., Moorman, J.R.: Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart. Circ. Physiol. 278(6), H2039–H2049 (2000)

    Google Scholar 

  43. Lake, D.E., Richman, J.S., Griffin, M.P., Moorman, J.R.: Sample entropy analysis of neonatal heart rate variability. Am. J. Physiol. Regul. Integ. Comp. Physiol. 283(3), R789–R797 (2002)

    Article  Google Scholar 

  44. Chakrabarti, K., Keogh, E., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans. Database Syst. (TODS) 27(2), 188–228 (2002)

    Article  Google Scholar 

  45. Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pp. 2–11. ACM (2003)

  46. Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary Lp norms. VLDB, In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 385–394. (2000)

  47. Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min. Knowl. Discov. 7(4), 349–371 (2003)

    Article  MathSciNet  Google Scholar 

  48. Meijer, B.R.: Rules and algorithms for the design of templates for template matching. In: Pattern Recognition, Conference A: Computer Vision and Applications, In: Proceedings of the 11th IAPR International Conference on, pp. 760–763. IEEE (1992)

  49. Baranovich, A.: VX heavens. http://vx.netlux.org

  50. Georgia Tech Information Security Center.: Offensive computing (2005)

  51. Han, K.S., Lim, J.H., Kang, B., Im, E.G.: Malware analysis using visualized images and entropy graphs. Int. J. Inf. Secur. 14(1), 1–14 (2015)

  52. Bat-Erdene, M., Kim, T., Li, H., Lee, H.: Dynamic classification of packing algorithms for inspecting executables using entropy analysis. In: MALWARE, 8th International Conference on, pp. 19–26. IEEE (2013)

Download references

Acknowledgments

A preliminary version of this paper was presented at the 8th IEEE International Conference on Malware 2013 [52].

M.-S.Choi acknowledges the support by the National Research Foundation of Korea (Grant No. 2015-003689).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heejo Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bat-Erdene, M., Park, H., Li, H. et al. Entropy analysis to classify unknown packing algorithms for malware detection. Int. J. Inf. Secur. 16, 227–248 (2017). https://doi.org/10.1007/s10207-016-0330-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-016-0330-4

Keywords

Navigation