Abstract
Each year, a huge number of malicious programs are released which causes malware detection to become a critical task in computer security. Antiviruses use various methods for detecting malware, such as signature-based and heuristic-based techniques. Polymorphic and metamorphic malwares employ obfuscation techniques to bypass traditional detection methods used by antiviruses. Recently, the number of these malware has increased dramatically. Most of the previously proposed methods to detect malware are based on high-level features such as opcodes, function calls or program’s control flow graph (CFG). Due to new obfuscation techniques, extracting high-level features is tough, fallible and time-consuming; hence approaches using program’s bytes are quicker and more accurate. In this paper, a novel byte-level method for detecting malware by audio signal processing techniques is presented. In our proposed method, program’s bytes are converted to a meaningful audio signal, then Music Information Retrieval (MIR) techniques are employed to construct a machine learning music classification model from audio signals to detect new and unseen instances. Experiments evaluate the influence of different strategies converting bytes to audio signals and the effectiveness of the method.
Similar content being viewed by others
References
Moir, R.: Defining Malware: FAQ. Microsoft TechNet. https://technet.microsoft.com/en-us/library/dd632948.aspx (2003). Accessed 17 Feb 2017
Symantec.: Internet Security Threat Report, Volume 17. Technical report, Symantec Corporation (2011). http://www.symantec.com/content/en/us/enterprise/other_resources/b-istr_main_report_2011_21239364.en-us.pdf. Accessed 19 May 2018
Vinod, P., Jaipur, R., Laxmi, V., Gaur, M.: Survey on malware detection methods. In: Proceedings of the 3rd Hackers’ Workshop on Computer and Internet Security (IITKHACK’09), pp. 74–79 (2009)
Wong, W.: Analysis and detection of metamorphic computer viruses. Department of Computer Science, San Jose State University, May, Master’s Thesis (2006)
Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.P.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. (Ny) 231, 64–82 (2013)
Typke, R., Wiering, F., Veltkamp, R.C.: A survey of music information retrieval systems. In: ISMIR, pp. 153–160 (2005)
Fu, Z., Lu, G., Ting, K.M., Zhang, D.: A survey of audio-based music classification and annotation. IEEE Trans. Multimed. 13(2), 303–319 (2011)
Tiwari, V.: MFCC and its applications in speaker recognition. Int. J. Emerg. Technol. 1(1), 19–22 (2010)
Zhou, Y., Inge, W.M.: Malware detection using adaptive data compression. In: Proceedings of the 1st ACM Workshop on Workshop on AISec, pp. 53–60 (2008)
Khorsand, Z., Hamzeh, A.: A novel compression-based approach for malware detection using PE header. In: 2013 5th Conference on IEEE Information and Knowledge Technology (IKT), pp. 127–133 (2013)
Schultz, M.G., Eskin, E., Zadok, F., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings. 2001 IEEE Symposium on Security and Privacy, 2001. S\(\backslash \)&P 2001, pp. 38–49 (2001)
Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7(Dec), 2721–2744 (2006)
Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B. S.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, vol. 4 (2011)
Han, K.S., Lim, J.H., Kang, B., Im, E.G.: Malware analysis using visualized images and entropy graphs. Int. J. Inf. Secur. 14(1), 1–14 (2015)
Nataraj, L., Yegneswaran, V., Porras, P., Zhang, J.: A comparative assessment of malware classification using binary texture analysis and dynamic analysis. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 21–30 (2011)
Hashemi, H., Azmoodeh, A., Hamzeh, A., Hashemi, S.: Graph embedding as a new approach for unknown malware detection. J. Comput. Virol. Hacking Tech. 13(3), 153–166 (2017)
Yu, X., Zhang, J., Liu, J., Wan, W., Yang, W.: An audio retrieval method based on chromagram and distance metrics. In: 2010 International Conference on. IEEE Audio Language and Image Processing (ICALIP), pp. 425–428 (2010)
Harrington, P.: Machine Learning in Action, no. 3, vol. 37. Manning Publications Co., Greenwich, CT, USA (2012)
FluidSynth 2.0. http://www.fluidsynth.org/, Accessed 17 Feb 2017
Giannakopoulos, T.: pyAudioAnalysis: an open-source python library for audio signal analysis. PLoS ONE 10(12), 1–17 (2015)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Vanderplas, J.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
Microsoft Malware Classification Challenge (BIG 2015), Kaggle. https://www.kaggle.com/c/malware-classification. Accessed 17 Feb 2017
Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–39 (2011)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 1995 International Joint Conference on Artificial Intelligence, vol. 14, no. 2, pp. 1137–1145 (1995)
Dodge, C., Jerse, T.A.: Computer music: synthesis, composition and performance. Macmillan Library Reference, Hampshire (1997)
Bello, J. P.: MIDI Code, NewYork University. https://www.nyu.edu/classes/bello/FMT_files/9_MIDI_code.pdf. Accessed 14 May 2018
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Farrokhmanesh, M., Hamzeh, A. Music classification as a new approach for malware detection. J Comput Virol Hack Tech 15, 77–96 (2019). https://doi.org/10.1007/s11416-018-0321-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-018-0321-2