Skip to main content

Multi-labeling of Malware Samples Using Behavior Reports and Fuzzy Hashing

  • Conference paper
  • First Online:
Telematics and Computing (WITCOM 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1906))

Included in the following conference series:

  • 329 Accesses

Abstract

Current binary and multi-class (family) approaches for malware classification can hardly be of use for the identification and analysis of other samples. Popular family classification methods lack any formal naming definitions and the ability to describe samples with single and multiple behaviors. However, alternatives such as manual and detailed analysis of malware samples are expensive both in time and computational resources. This generates the need to find an intermediate point, with which the labeling of samples can be speeded up, while at the same time, a better description of their behavior is obtained. In this paper, we propose a new automated malware sample labeling scheme. Said scheme assigns a set of labels to each sample, based on the mapping of keywords found in file, behavior, and analysis reports provided by VirusTotal, to a proposed multi-label behavior-focused taxonomy; as well as measuring similarity between samples using multiple fuzzy hashing functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barwinski, M.A.: Taxonomy of spyware and empirical study of network drive-by-downloads. Technical report, Naval Postgraduate School Monterey CA (2005)

    Google Scholar 

  2. Bravo, P., García, D.F.: Rootkits survey. Architecture 6, 7 (2011)

    Google Scholar 

  3. Carpenter, M., Luo, C.: Behavioural reports of multi-stage malware. arXiv preprint arXiv:2301.12800 (2023)

  4. Cerf, V., Kahn, R.: A protocol for packet network intercommunication. IEEE Trans. Commun. 22(5), 637–648 (1974)

    Article  MATH  Google Scholar 

  5. Corporation, C.P.T.M.: CVE-2021-44832. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44832

  6. Corporation, C.P.T.M.: CVE-2022-0101. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-0101

  7. Corporation, C.P.T.M.: CVE-2022-21841. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-21841

  8. for Cybersecurity (ENISA), E.U.A.: Glossary (2021). https://www.enisa.europa.eu/topics/csirts-in-europe/glossary

  9. Cynet: Cynet autoxdr™ \(|\) cybersecurity made easy. https://www.cynet.com/

  10. Dukes, C.: Committee on national security systems (CNSs) glossary. CNSSI, Fort 1322 Meade, MD, USA, Technical report, vol. 1323, pp. 1324–1325 (2015)

    Google Scholar 

  11. Eylenburg, A.: Operating systems: timeline and family tree. https://eylenburg.github.io/os_familytree.htm

  12. García-Teodoro, P., Gómez-Hernández, J.A., Abellán-Galera, A.: Multi-labeling of complex, multi-behavioral malware samples. Comput. Secur. 121, 102845 (2022)

    Article  Google Scholar 

  13. Grance, T., Hash, J., Peck, S., Smith, J., Korow-Diks, K.: Security guide for interconnecting information technology systems: recommendations of the national institute of standards and technology. Technical report, National Inst of Standards and Technology Gaithersburg MD (2002)

    Google Scholar 

  14. Grégio, A.R.A., Afonso, V.M., Filho, D.S.F., Geus, P.L.d., Jino, M.: Toward a taxonomy of malware behaviors. Comput. J. 58(10), 2758–2777 (2015)

    Google Scholar 

  15. Hachem, N., Ben Mustapha, Y., Granadillo, G.G., Debar, H.: Botnets: lifecycle and taxonomy. In: 2011 Conference on Network and Information Systems Security, pp. 1–8 (2011). https://doi.org/10.1109/SAR-SSI.2011.5931395

  16. Hahn, K.: Naming malware: why this jumbled mess is our own fault. https://www.gdatasoftware.com/blog/malware-family-naming-hell

  17. Hurier, M., et al.: Euphony: harmonious unification of cacophonous anti-virus vendor labels for android malware. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 425–435. IEEE (2017)

    Google Scholar 

  18. Instinct, D.: Deep instinct \(|\) deep learning AI cybersecurity platform. https://www.deepinstinct.com/

  19. Intelligence, M.S.: Win32/zbot threat description - microsoft security intelligence. https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-description?name=win32%2Fzbot

  20. Ismail, Z., Jantan, A., Najwadiyusoff, M., Kiru, M.: A botnet taxonomy and detection approaches. Test Eng. Manag. 88, 3386–3408 (2020)

    Google Scholar 

  21. James, J.I.: Similarity comparison with sdhash (fuzzy hashing) - dfirscience. https://dfir.science/2012/09/similarity-comparison-with-sdhash-fuzzy.html

  22. Karresand, M.: A proposed taxonomy of software weapons (2002)

    Google Scholar 

  23. Kaspersky: Heuristic and proactive detections \(|\) Kaspersky it encyclopedia. https://encyclopedia.kaspersky.com/knowledge/heuristic-and-proactive-detections/

  24. Kaspersky: Trojan \(|\) kaspersky it encyclopedia. https://encyclopedia.kaspersky.com/glossary/trojan/

  25. Khattak, S., Ramay, N.R., Khan, K.R., Syed, A.A., Khayam, S.A.: A taxonomy of botnet behavior, detection, and defense. IEEE Commun. Surv. Tutor. 16(2), 898–924 (2013)

    Article  Google Scholar 

  26. Kim, S., Jung, W., Lee, K., Oh, H., Kim, E.T.: Sumav: fully automated malware labeling. ICT Express 8(4), 530–538 (2022)

    Article  Google Scholar 

  27. Kocher, P., et al.: Spectre attacks: exploiting speculative execution. CoRR abs/1801.01203 (2018). https://arxiv.org/abs/1801.01203

  28. Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digit. Investig. 3, 91–97 (2006)

    Article  Google Scholar 

  29. Latto, N.: Worm vs. virus: what’s the difference and does it matter? (2022). https://www.avast.com/c-worm-vs-virus

  30. Lipp, M., et al.: Meltdown. CoRR abs/1801.01207 (2018). https://arxiv.org/abs/1801.01207

  31. Martín-Pérez, M., Rodríguez, R.J., Breitinger, F.: Bringing order to approximate matching: classification and attacks on similarity digest algorithms. Forensic Sci. Int.: Digit. Invest. 36, 301120 (2021)

    Google Scholar 

  32. MATCODE: Mpress - free high-performance executable packer forpe32+/.net/mac-os-x. https://www.matcode.com/mpress.htm

  33. Micro, T.: Taxonomy of botnet threats. Whitepaper (2006)

    Google Scholar 

  34. Microsoft: Malware names. https://docs.microsoft.com/en-us/microsoft-365/security/intelligence/malware-naming

  35. Nieles, M., Dempsey, K., Pillitteri, V.Y., et al.: An introduction to information security. NIST Special Publication 800(12), 101 (2017)

    Google Scholar 

  36. Oliver, J., Cheng, C., Chen, Y.: TLSH–a locality sensitive hash. In: 2013 Fourth Cybercrime and Trustworthy Computing Workshop, pp. 7–13. IEEE (2013)

    Google Scholar 

  37. Organization, C.A.R.: Naming scheme - Caro - computer antivirus research organization. https://web.archive.org/web/20150923200549/. https://www.caro.org/naming/scheme.html

  38. Paik, J.Y., Jin, R.: Malware family prediction with an awareness of label uncertainty. Comput. J. (2022)

    Google Scholar 

  39. Pratama, A., Rafrastara, F.A.: Computer worm classification. Int. J. Comput. Sci. Inf. Secur. 10, 21–24 (2012)

    Google Scholar 

  40. Qiao, Q., Feng, R., Chen, S., Zhang, F., Li, X.: Multi-label classification for Android malware based on active learning. IEEE Trans. Dependable Secure Comput. (2022)

    Google Scholar 

  41. Roussev, V.: Data fingerprinting with similarity digests. In: Chow, K.-P., Shenoi, S. (eds.) DigitalForensics 2010. IAICT, vol. 337, pp. 207–226. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15506-2_15

    Chapter  Google Scholar 

  42. Rutkowska, J.: Introducing stealth malware taxonomy. COSEINC Advanced Malware Labs, pp. 1–9 (2006)

    Google Scholar 

  43. Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: a tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_11

    Chapter  Google Scholar 

  44. Sebastián, S., Caballero, J.: AVClass2: massive malware tag extraction from AV labels. In: Annual Computer Security Applications Conference, pp. 42–53 (2020)

    Google Scholar 

  45. Simmons, C., Ellis, C., Shiva, S., Dasgupta, D., Wu, Q.: AVOIDIT: a cyber attack taxonomy. In: 9th Annual Symposium on Information Assurance (ASIA 2014), pp. 2–12 (2014)

    Google Scholar 

  46. Simsolo, Y.: Owasp 10 most common backdoors. https://owasp.org/www-pdf-archive/OWASP_10_Most_Common_Backdoors.pdf

  47. Stallings, W., Brown, L., Bauer, M.D., Howard, M.: Computer Security: Principles and Practice, vol. 2. Pearson, Upper Saddle River (2012)

    Google Scholar 

  48. Szor, P.: The Art of Computer Virus Research and Defense. Addison-Wesley Professional (2005)

    Google Scholar 

  49. Tripathy, S., Kapat, S., Das, S., Panda, B.: A spyware detection system with a comparative study of spywares using classification rule mining. Int. J. Sci. Eng. Res. 7 (2016)

    Google Scholar 

  50. Vassil Roussev, C.Q.: Quick start - the sdhash tutorial. https://roussev.net/sdhash/tutorial/03-quick.html#result-interpretation

  51. VirusTotal: File behaviour. https://developers.virustotal.com/reference/file-behaviour-summary

  52. VirusTotal: Virustotal - file - 2400e927b316aa75771c1597dad5. https://www.virustotal.com/gui/file/29ae18b552052271c671ba22b6fa6c9a

  53. VirusTotal: Virustotal repository. https://www.virustotal.com/gui/home/upload

Download references

Acknowledgements

This work has been supported by the CONACyT and the Instituto Politécnico Nacional.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rolando Sánchez-Fraga .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sánchez-Fraga, R., Acosta-Bermejo, R., Aguirre-Anaya, E. (2023). Multi-labeling of Malware Samples Using Behavior Reports and Fuzzy Hashing. In: Mata-Rivera, M.F., Zagal-Flores, R., Barria-Huidobro, C. (eds) Telematics and Computing. WITCOM 2023. Communications in Computer and Information Science, vol 1906. Springer, Cham. https://doi.org/10.1007/978-3-031-45316-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45316-8_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45315-1

  • Online ISBN: 978-3-031-45316-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics