Multi-labeling of Malware Samples Using Behavior Reports and Fuzzy Hashing

Sánchez-Fraga, Rolando; Acosta-Bermejo, Raúl; Aguirre-Anaya, Eleazar

doi:10.1007/978-3-031-45316-8_19

Rolando Sánchez-Fraga⁸,
Raúl Acosta-Bermejo⁸ &
Eleazar Aguirre-Anaya⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1906))

Included in the following conference series:

International Congress of Telematics and Computing

329 Accesses

Abstract

Current binary and multi-class (family) approaches for malware classification can hardly be of use for the identification and analysis of other samples. Popular family classification methods lack any formal naming definitions and the ability to describe samples with single and multiple behaviors. However, alternatives such as manual and detailed analysis of malware samples are expensive both in time and computational resources. This generates the need to find an intermediate point, with which the labeling of samples can be speeded up, while at the same time, a better description of their behavior is obtained. In this paper, we propose a new automated malware sample labeling scheme. Said scheme assigns a set of labels to each sample, based on the mapping of keywords found in file, behavior, and analysis reports provided by VirusTotal, to a proposed multi-label behavior-focused taxonomy; as well as measuring similarity between samples using multiple fuzzy hashing functions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barwinski, M.A.: Taxonomy of spyware and empirical study of network drive-by-downloads. Technical report, Naval Postgraduate School Monterey CA (2005)
Google Scholar
Bravo, P., García, D.F.: Rootkits survey. Architecture 6, 7 (2011)
Google Scholar
Carpenter, M., Luo, C.: Behavioural reports of multi-stage malware. arXiv preprint arXiv:2301.12800 (2023)
Cerf, V., Kahn, R.: A protocol for packet network intercommunication. IEEE Trans. Commun. 22(5), 637–648 (1974)
Article MATH Google Scholar
Corporation, C.P.T.M.: CVE-2021-44832. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44832
Corporation, C.P.T.M.: CVE-2022-0101. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-0101
Corporation, C.P.T.M.: CVE-2022-21841. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-21841
for Cybersecurity (ENISA), E.U.A.: Glossary (2021). https://www.enisa.europa.eu/topics/csirts-in-europe/glossary
Cynet: Cynet autoxdr™ \(|\) cybersecurity made easy. https://www.cynet.com/
Dukes, C.: Committee on national security systems (CNSs) glossary. CNSSI, Fort 1322 Meade, MD, USA, Technical report, vol. 1323, pp. 1324–1325 (2015)
Google Scholar
Eylenburg, A.: Operating systems: timeline and family tree. https://eylenburg.github.io/os_familytree.htm
García-Teodoro, P., Gómez-Hernández, J.A., Abellán-Galera, A.: Multi-labeling of complex, multi-behavioral malware samples. Comput. Secur. 121, 102845 (2022)
Article Google Scholar
Grance, T., Hash, J., Peck, S., Smith, J., Korow-Diks, K.: Security guide for interconnecting information technology systems: recommendations of the national institute of standards and technology. Technical report, National Inst of Standards and Technology Gaithersburg MD (2002)
Google Scholar
Grégio, A.R.A., Afonso, V.M., Filho, D.S.F., Geus, P.L.d., Jino, M.: Toward a taxonomy of malware behaviors. Comput. J. 58(10), 2758–2777 (2015)
Google Scholar
Hachem, N., Ben Mustapha, Y., Granadillo, G.G., Debar, H.: Botnets: lifecycle and taxonomy. In: 2011 Conference on Network and Information Systems Security, pp. 1–8 (2011). https://doi.org/10.1109/SAR-SSI.2011.5931395
Hahn, K.: Naming malware: why this jumbled mess is our own fault. https://www.gdatasoftware.com/blog/malware-family-naming-hell
Hurier, M., et al.: Euphony: harmonious unification of cacophonous anti-virus vendor labels for android malware. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 425–435. IEEE (2017)
Google Scholar
Instinct, D.: Deep instinct \(|\) deep learning AI cybersecurity platform. https://www.deepinstinct.com/
Intelligence, M.S.: Win32/zbot threat description - microsoft security intelligence. https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-description?name=win32%2Fzbot
Ismail, Z., Jantan, A., Najwadiyusoff, M., Kiru, M.: A botnet taxonomy and detection approaches. Test Eng. Manag. 88, 3386–3408 (2020)
Google Scholar
James, J.I.: Similarity comparison with sdhash (fuzzy hashing) - dfirscience. https://dfir.science/2012/09/similarity-comparison-with-sdhash-fuzzy.html
Karresand, M.: A proposed taxonomy of software weapons (2002)
Google Scholar
Kaspersky: Heuristic and proactive detections \(|\) Kaspersky it encyclopedia. https://encyclopedia.kaspersky.com/knowledge/heuristic-and-proactive-detections/
Kaspersky: Trojan \(|\) kaspersky it encyclopedia. https://encyclopedia.kaspersky.com/glossary/trojan/
Khattak, S., Ramay, N.R., Khan, K.R., Syed, A.A., Khayam, S.A.: A taxonomy of botnet behavior, detection, and defense. IEEE Commun. Surv. Tutor. 16(2), 898–924 (2013)
Article Google Scholar
Kim, S., Jung, W., Lee, K., Oh, H., Kim, E.T.: Sumav: fully automated malware labeling. ICT Express 8(4), 530–538 (2022)
Article Google Scholar
Kocher, P., et al.: Spectre attacks: exploiting speculative execution. CoRR abs/1801.01203 (2018). https://arxiv.org/abs/1801.01203
Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digit. Investig. 3, 91–97 (2006)
Article Google Scholar
Latto, N.: Worm vs. virus: what’s the difference and does it matter? (2022). https://www.avast.com/c-worm-vs-virus
Lipp, M., et al.: Meltdown. CoRR abs/1801.01207 (2018). https://arxiv.org/abs/1801.01207
Martín-Pérez, M., Rodríguez, R.J., Breitinger, F.: Bringing order to approximate matching: classification and attacks on similarity digest algorithms. Forensic Sci. Int.: Digit. Invest. 36, 301120 (2021)
Google Scholar
MATCODE: Mpress - free high-performance executable packer forpe32+/.net/mac-os-x. https://www.matcode.com/mpress.htm
Micro, T.: Taxonomy of botnet threats. Whitepaper (2006)
Google Scholar
Microsoft: Malware names. https://docs.microsoft.com/en-us/microsoft-365/security/intelligence/malware-naming
Nieles, M., Dempsey, K., Pillitteri, V.Y., et al.: An introduction to information security. NIST Special Publication 800(12), 101 (2017)
Google Scholar
Oliver, J., Cheng, C., Chen, Y.: TLSH–a locality sensitive hash. In: 2013 Fourth Cybercrime and Trustworthy Computing Workshop, pp. 7–13. IEEE (2013)
Google Scholar
Organization, C.A.R.: Naming scheme - Caro - computer antivirus research organization. https://web.archive.org/web/20150923200549/. https://www.caro.org/naming/scheme.html
Paik, J.Y., Jin, R.: Malware family prediction with an awareness of label uncertainty. Comput. J. (2022)
Google Scholar
Pratama, A., Rafrastara, F.A.: Computer worm classification. Int. J. Comput. Sci. Inf. Secur. 10, 21–24 (2012)
Google Scholar
Qiao, Q., Feng, R., Chen, S., Zhang, F., Li, X.: Multi-label classification for Android malware based on active learning. IEEE Trans. Dependable Secure Comput. (2022)
Google Scholar
Roussev, V.: Data fingerprinting with similarity digests. In: Chow, K.-P., Shenoi, S. (eds.) DigitalForensics 2010. IAICT, vol. 337, pp. 207–226. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15506-2_15
Chapter Google Scholar
Rutkowska, J.: Introducing stealth malware taxonomy. COSEINC Advanced Malware Labs, pp. 1–9 (2006)
Google Scholar
Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: a tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_11
Chapter Google Scholar
Sebastián, S., Caballero, J.: AVClass2: massive malware tag extraction from AV labels. In: Annual Computer Security Applications Conference, pp. 42–53 (2020)
Google Scholar
Simmons, C., Ellis, C., Shiva, S., Dasgupta, D., Wu, Q.: AVOIDIT: a cyber attack taxonomy. In: 9th Annual Symposium on Information Assurance (ASIA 2014), pp. 2–12 (2014)
Google Scholar
Simsolo, Y.: Owasp 10 most common backdoors. https://owasp.org/www-pdf-archive/OWASP_10_Most_Common_Backdoors.pdf
Stallings, W., Brown, L., Bauer, M.D., Howard, M.: Computer Security: Principles and Practice, vol. 2. Pearson, Upper Saddle River (2012)
Google Scholar
Szor, P.: The Art of Computer Virus Research and Defense. Addison-Wesley Professional (2005)
Google Scholar
Tripathy, S., Kapat, S., Das, S., Panda, B.: A spyware detection system with a comparative study of spywares using classification rule mining. Int. J. Sci. Eng. Res. 7 (2016)
Google Scholar
Vassil Roussev, C.Q.: Quick start - the sdhash tutorial. https://roussev.net/sdhash/tutorial/03-quick.html#result-interpretation
VirusTotal: File behaviour. https://developers.virustotal.com/reference/file-behaviour-summary
VirusTotal: Virustotal - file - 2400e927b316aa75771c1597dad5. https://www.virustotal.com/gui/file/29ae18b552052271c671ba22b6fa6c9a
VirusTotal: Virustotal repository. https://www.virustotal.com/gui/home/upload

Download references

Acknowledgements

This work has been supported by the CONACyT and the Instituto Politécnico Nacional.

Author information

Authors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Laboratorio de Ciberseguridad, Av. Juan de Dios Bátiz, Nueva Industrial Vallejo, Gustavo A. Madero, 07738, Mexico City, Mexico
Rolando Sánchez-Fraga, Raúl Acosta-Bermejo & Eleazar Aguirre-Anaya

Authors

Rolando Sánchez-Fraga
View author publications
You can also search for this author in PubMed Google Scholar
Raúl Acosta-Bermejo
View author publications
You can also search for this author in PubMed Google Scholar
Eleazar Aguirre-Anaya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rolando Sánchez-Fraga .

Editor information

Editors and Affiliations

UPIITA - Instituto Politécnico Nacional, México, Mexico
Miguel Félix Mata-Rivera
ESCOM - Instituto Politécnico Nacional, México, Mexico
Roberto Zagal-Flores
Universidad Mayor, Santiago, Chile
Cristian Barria-Huidobro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sánchez-Fraga, R., Acosta-Bermejo, R., Aguirre-Anaya, E. (2023). Multi-labeling of Malware Samples Using Behavior Reports and Fuzzy Hashing. In: Mata-Rivera, M.F., Zagal-Flores, R., Barria-Huidobro, C. (eds) Telematics and Computing. WITCOM 2023. Communications in Computer and Information Science, vol 1906. Springer, Cham. https://doi.org/10.1007/978-3-031-45316-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-45316-8_19
Published: 06 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45315-1
Online ISBN: 978-3-031-45316-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-labeling of Malware Samples Using Behavior Reports and Fuzzy Hashing