Abstract
Current binary and multi-class (family) approaches for malware classification can hardly be of use for the identification and analysis of other samples. Popular family classification methods lack any formal naming definitions and the ability to describe samples with single and multiple behaviors. However, alternatives such as manual and detailed analysis of malware samples are expensive both in time and computational resources. This generates the need to find an intermediate point, with which the labeling of samples can be speeded up, while at the same time, a better description of their behavior is obtained. In this paper, we propose a new automated malware sample labeling scheme. Said scheme assigns a set of labels to each sample, based on the mapping of keywords found in file, behavior, and analysis reports provided by VirusTotal, to a proposed multi-label behavior-focused taxonomy; as well as measuring similarity between samples using multiple fuzzy hashing functions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barwinski, M.A.: Taxonomy of spyware and empirical study of network drive-by-downloads. Technical report, Naval Postgraduate School Monterey CA (2005)
Bravo, P., García, D.F.: Rootkits survey. Architecture 6, 7 (2011)
Carpenter, M., Luo, C.: Behavioural reports of multi-stage malware. arXiv preprint arXiv:2301.12800 (2023)
Cerf, V., Kahn, R.: A protocol for packet network intercommunication. IEEE Trans. Commun. 22(5), 637–648 (1974)
Corporation, C.P.T.M.: CVE-2021-44832. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44832
Corporation, C.P.T.M.: CVE-2022-0101. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-0101
Corporation, C.P.T.M.: CVE-2022-21841. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-21841
for Cybersecurity (ENISA), E.U.A.: Glossary (2021). https://www.enisa.europa.eu/topics/csirts-in-europe/glossary
Cynet: Cynet autoxdr™ \(|\) cybersecurity made easy. https://www.cynet.com/
Dukes, C.: Committee on national security systems (CNSs) glossary. CNSSI, Fort 1322 Meade, MD, USA, Technical report, vol. 1323, pp. 1324–1325 (2015)
Eylenburg, A.: Operating systems: timeline and family tree. https://eylenburg.github.io/os_familytree.htm
García-Teodoro, P., Gómez-Hernández, J.A., Abellán-Galera, A.: Multi-labeling of complex, multi-behavioral malware samples. Comput. Secur. 121, 102845 (2022)
Grance, T., Hash, J., Peck, S., Smith, J., Korow-Diks, K.: Security guide for interconnecting information technology systems: recommendations of the national institute of standards and technology. Technical report, National Inst of Standards and Technology Gaithersburg MD (2002)
Grégio, A.R.A., Afonso, V.M., Filho, D.S.F., Geus, P.L.d., Jino, M.: Toward a taxonomy of malware behaviors. Comput. J. 58(10), 2758–2777 (2015)
Hachem, N., Ben Mustapha, Y., Granadillo, G.G., Debar, H.: Botnets: lifecycle and taxonomy. In: 2011 Conference on Network and Information Systems Security, pp. 1–8 (2011). https://doi.org/10.1109/SAR-SSI.2011.5931395
Hahn, K.: Naming malware: why this jumbled mess is our own fault. https://www.gdatasoftware.com/blog/malware-family-naming-hell
Hurier, M., et al.: Euphony: harmonious unification of cacophonous anti-virus vendor labels for android malware. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 425–435. IEEE (2017)
Instinct, D.: Deep instinct \(|\) deep learning AI cybersecurity platform. https://www.deepinstinct.com/
Intelligence, M.S.: Win32/zbot threat description - microsoft security intelligence. https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-description?name=win32%2Fzbot
Ismail, Z., Jantan, A., Najwadiyusoff, M., Kiru, M.: A botnet taxonomy and detection approaches. Test Eng. Manag. 88, 3386–3408 (2020)
James, J.I.: Similarity comparison with sdhash (fuzzy hashing) - dfirscience. https://dfir.science/2012/09/similarity-comparison-with-sdhash-fuzzy.html
Karresand, M.: A proposed taxonomy of software weapons (2002)
Kaspersky: Heuristic and proactive detections \(|\) Kaspersky it encyclopedia. https://encyclopedia.kaspersky.com/knowledge/heuristic-and-proactive-detections/
Kaspersky: Trojan \(|\) kaspersky it encyclopedia. https://encyclopedia.kaspersky.com/glossary/trojan/
Khattak, S., Ramay, N.R., Khan, K.R., Syed, A.A., Khayam, S.A.: A taxonomy of botnet behavior, detection, and defense. IEEE Commun. Surv. Tutor. 16(2), 898–924 (2013)
Kim, S., Jung, W., Lee, K., Oh, H., Kim, E.T.: Sumav: fully automated malware labeling. ICT Express 8(4), 530–538 (2022)
Kocher, P., et al.: Spectre attacks: exploiting speculative execution. CoRR abs/1801.01203 (2018). https://arxiv.org/abs/1801.01203
Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digit. Investig. 3, 91–97 (2006)
Latto, N.: Worm vs. virus: what’s the difference and does it matter? (2022). https://www.avast.com/c-worm-vs-virus
Lipp, M., et al.: Meltdown. CoRR abs/1801.01207 (2018). https://arxiv.org/abs/1801.01207
Martín-Pérez, M., Rodríguez, R.J., Breitinger, F.: Bringing order to approximate matching: classification and attacks on similarity digest algorithms. Forensic Sci. Int.: Digit. Invest. 36, 301120 (2021)
MATCODE: Mpress - free high-performance executable packer forpe32+/.net/mac-os-x. https://www.matcode.com/mpress.htm
Micro, T.: Taxonomy of botnet threats. Whitepaper (2006)
Microsoft: Malware names. https://docs.microsoft.com/en-us/microsoft-365/security/intelligence/malware-naming
Nieles, M., Dempsey, K., Pillitteri, V.Y., et al.: An introduction to information security. NIST Special Publication 800(12), 101 (2017)
Oliver, J., Cheng, C., Chen, Y.: TLSH–a locality sensitive hash. In: 2013 Fourth Cybercrime and Trustworthy Computing Workshop, pp. 7–13. IEEE (2013)
Organization, C.A.R.: Naming scheme - Caro - computer antivirus research organization. https://web.archive.org/web/20150923200549/. https://www.caro.org/naming/scheme.html
Paik, J.Y., Jin, R.: Malware family prediction with an awareness of label uncertainty. Comput. J. (2022)
Pratama, A., Rafrastara, F.A.: Computer worm classification. Int. J. Comput. Sci. Inf. Secur. 10, 21–24 (2012)
Qiao, Q., Feng, R., Chen, S., Zhang, F., Li, X.: Multi-label classification for Android malware based on active learning. IEEE Trans. Dependable Secure Comput. (2022)
Roussev, V.: Data fingerprinting with similarity digests. In: Chow, K.-P., Shenoi, S. (eds.) DigitalForensics 2010. IAICT, vol. 337, pp. 207–226. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15506-2_15
Rutkowska, J.: Introducing stealth malware taxonomy. COSEINC Advanced Malware Labs, pp. 1–9 (2006)
Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: a tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_11
Sebastián, S., Caballero, J.: AVClass2: massive malware tag extraction from AV labels. In: Annual Computer Security Applications Conference, pp. 42–53 (2020)
Simmons, C., Ellis, C., Shiva, S., Dasgupta, D., Wu, Q.: AVOIDIT: a cyber attack taxonomy. In: 9th Annual Symposium on Information Assurance (ASIA 2014), pp. 2–12 (2014)
Simsolo, Y.: Owasp 10 most common backdoors. https://owasp.org/www-pdf-archive/OWASP_10_Most_Common_Backdoors.pdf
Stallings, W., Brown, L., Bauer, M.D., Howard, M.: Computer Security: Principles and Practice, vol. 2. Pearson, Upper Saddle River (2012)
Szor, P.: The Art of Computer Virus Research and Defense. Addison-Wesley Professional (2005)
Tripathy, S., Kapat, S., Das, S., Panda, B.: A spyware detection system with a comparative study of spywares using classification rule mining. Int. J. Sci. Eng. Res. 7 (2016)
Vassil Roussev, C.Q.: Quick start - the sdhash tutorial. https://roussev.net/sdhash/tutorial/03-quick.html#result-interpretation
VirusTotal: File behaviour. https://developers.virustotal.com/reference/file-behaviour-summary
VirusTotal: Virustotal - file - 2400e927b316aa75771c1597dad5. https://www.virustotal.com/gui/file/29ae18b552052271c671ba22b6fa6c9a
VirusTotal: Virustotal repository. https://www.virustotal.com/gui/home/upload
Acknowledgements
This work has been supported by the CONACyT and the Instituto Politécnico Nacional.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sánchez-Fraga, R., Acosta-Bermejo, R., Aguirre-Anaya, E. (2023). Multi-labeling of Malware Samples Using Behavior Reports and Fuzzy Hashing. In: Mata-Rivera, M.F., Zagal-Flores, R., Barria-Huidobro, C. (eds) Telematics and Computing. WITCOM 2023. Communications in Computer and Information Science, vol 1906. Springer, Cham. https://doi.org/10.1007/978-3-031-45316-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-45316-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45315-1
Online ISBN: 978-3-031-45316-8
eBook Packages: Computer ScienceComputer Science (R0)