Abstract
Effective and efficient malware detection is at the forefront of research into building secure digital systems. As with many other fields, malware detection research has seen a dramatic increase in the application of machine learning algorithms. One machine learning technique that has been used widely in the field of pattern matching in general—and malware detection in particular—is hidden Markov models (HMMs). HMM training is based on a hill climb, and hence we can often improve a model by training multiple times with different initial values. In this research, we compare boosted HMMs (using AdaBoost) to HMMs trained with multiple random restarts, in the context of malware detection. These techniques are applied to a variety of challenging malware datasets. We find that random restarts perform surprisingly well in comparison to boosting. Only in the most difficult “cold start” cases (where training data is severely limited) does boosting appear to offer sufficient improvement to justify its higher computational cost in the scoring phase.
Similar content being viewed by others
References
Accenture: Cost of cyber crime study (2017). https://www.accenture.com/t20170926T072837Z__w__/us-en/_acnmedia/PDF-61/Accenture-2017-CostCyberCrimeStudy.pdf. Accessed 8 June 2018
Annachhatre, C., Austin, T.H., Stamp, M.: Hidden Markov models for malware classification. J. Comput. Virol. Hacking Tech. 11(2), 59–73 (2015)
Ariu, D., Tronci, R., Giacinto, G.: HMMPayl: an intrusion detection system based on hidden Markov models. Comput. Secur. 30(4), 221–241 (2011)
Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Exploring hidden Markov models for virus analysis: a semantic approach. In: Proceedings of the 2013 46th Hawaii International Conference on System Sciences, HICSS ’13, pp. 5039–5048. IEEE Computer Society (2013)
Aycock, J.: Computer viruses and malware. In: Jajodia, S. (ed.) Advances in Information Security. Springer, US (2006)
Bagga, N.: Measuring the effectiveness of generic malware models. Master’s Project, Department of Computer Science, San Jose State University (2017). http://scholarworks.sjsu.edu/etd_projects/566/. Accessed 8 June 2018
Bagga, N., Troia, F.D., Stamp, M.: On the effectiveness of generic malware models. In: Proceedings of the 2018 International Workshop on Behavioral Analysis for System Security, BASS 2018 (2018)
Baysa, D., Low, R.M., Stamp, M.: Structural entropy and metamorphic malware. J. Comput. Virol. Hacking Tech. 9(4), 179–192 (2013)
Berg-Kirkpatrick, T., Klein, D.: Decipherment with a million random restarts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 874–878 (2013)
Bertacchini, M., Fierens, P.: A survey on masquerader detection approaches. In: Proceedings of V Congreso Iberoamericano de Seguridad Informática, Universidad de la República de Uruguay, pp. 46–60 (2009)
Bourlard, H., Kamp, Y., Wellekens, C.: Speaker dependent connected speech recognition via phonetic Markov models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’85, pp. 1213–1216 (1985)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)
Chen, Y.-S., Chen, Y.-M.: Combining incremental hidden Markov model and AdaBoost algorithm for anomaly intrusion detection. In: Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics, CSI-KDD ’09, pp. 3–9. ACM (2009)
Cho, S.-B., Park, H.-J.: Efficient anomaly detection by modeling privilege flows using hidden Markov model. Comput. Secur. 22(1), 45–55 (2003)
Cridex malware (2017). https://www.computerhope.com/jargon/c/cridex-malware.htm. Accessed 8 June 2018
DuPaul, N.: Common malware types: cybersecurity 101 (2012). https://www.veracode.com/blog/2012/10/common-malware-types-cybersecurity-101. Accessed 8 June 2018
Grabner, H., Bischof, H.: On-line boosting and vision. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR’06, pp. 260–267 (2006)
Harebot.m (2017). http://www.pandasecurity.com/homeusers/security-info/220319/Harebot.M. Accessed 8 June 2018
Hu, J., Yu, X., Qiu, D., Chen, H.-H.: A simple and efficient hidden Markov model scheme for host-based anomaly intrusion detection. IEEE Netw. Mag. Glob. Internetworking 23(1), 42–47 (2009)
Hu, W., Gao, J., Wang, Y., Wu, O., Maybank, S.: Online AdaBoost-based parameterized methods for dynamic distributed network intrusion detection. IEEE Trans. Cybern. 44(1), 66–82 (2014)
International Telecommunications Union (ICT) facts and figures 2017 (2017). https://www.itu.int/en/ITU-D/Statistics/Documents/facts/ICTFactsFigures2017.pdf. Accessed 8 June 2018
Jarng, S.S.: HMM voice recognition algorithm coding. In: 2011 International Conference on Information Science and Applications, pp. 1–7 (2011)
Kalbhor, A., Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Dueling hidden Markov models for virus analysis. J. Comput. Virol. Hacking Tech. 11(2), 103–118 (2015)
Kundu, A., He, Y., Bahl, P.: Recognition of handwritten word: first and second order hidden Markov model based approach. In: Proceedings of Computer Society Conference on Computer Vision and Pattern Recognition, CVPR’88, pp. 457–462 (1988)
Malicia Project (2015). http://malicia-project.com/. Accessed 8 June 2018
Morgan, S.: Top 5 cybersecurity facts, figures and statistics for 2018 (2018). https://www.csoonline.com/article/3153707/security/top-5-cybersecurity-facts-figures-and-statistics.html. Accessed 8 June 2018
Okamoto, T., Ishida, Y.: Framework of an immunity-based anomaly detection system for user behavior. In: International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp. 821–829. Springer (2007)
Okamoto, T., Ishida, Y.: Towards an immunity-based anomaly detection system for network traffic. Int. J. Knowl. Based Intell. Eng. Syst. 15(4), 215–225 (2011)
Panda Security: Cyber Crime Insights (2012). https://www.pandasecurity.com/mediacenter/malware/cyber-crime-insights/. Accessed 8 June 2018
Posadas, R., Mex-Perera, C., Monroy, R., Nolazco-Flores, J.: Hybrid method for detecting masqueraders using session folding and hidden Markov models. In: Proceedings of the 5th Mexican International Conference on Artificial Intelligence, MICAI’06, pp. 622–631. Springer, Berlin (2006)
Rajeswaran, D.: Function call graph score for malware detection. Master’s Project, Department of Computer Science, San Jose State University (2015). http://scholarworks.sjsu.edu/etd_projects/445/. Accessed 8 June 2018
Rajeswaran, D., Troia, F.D., Austin, T.H., Stamp, M.: Function call graphs versus machine learning for malware detection. In: Parkinson, S., Crampton, A., Hill, R. (eds.) Guide to Vulnerability Analysis for Computer Networks and Systems—An Artificial Intelligence Approach, Chapter 11. Springer, Berlin (2018)
Rajeswaran, D., Troia, F.D., Austin, T.H., Stamp, M.: A survey of machine learning algorithms and their application in information security. In: Parkinson, S., Crampton, A., Hill, R. (eds.) Guide to Vulnerability Analysis for Computer Networks and Systems—An Artificial Intelligence Approach, Chapter 2. Springer, Berlin (2018)
Rana, H., Stamp, M.: Hunting for pirated software using metamorphic analysis. Inf. Secur. J. A Glob. Perspect. 23(3), 68–85 (2014)
Rand Corporation: Cyber warfare. https://www.rand.org/topics/cyber-warfare.html. Accessed 8 June 2018
Securityshield (2017). https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-description?Name=SecurityShield. Accessed 8 June 2018
Shanmugam, G., Low, R.M., Stamp, M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hacking Tech. 9(3), 159–170 (2013)
Simova, M., Stamp, M., Pollett, C.: Stealthy ciphertext. In: Arabnia, H.R., Joshua, R. (eds.) Proceedings of the 2005 International Conference on Internet Computing, ICOMP 2005, pp. 380–388. CSREA Press (2005)
Singh, T., Troia, F.D., Visaggio, C.A., Austin, T.H., Stamp, M.: Support vector machines and malware detection. J. Comput. Virol. Hacking Tech. 12(4), 203–212 (2016)
Sperotto, A., Sadre, R., de Boer, P.-T., Pras, A.: Hidden Markov model modeling of SSH brute-force attacks. In: Bartolini, C., Gaspary, L.P. (eds.) Proceedings of 20th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 2009), volume 5841 of Lecture Notes in Computer Science, pp. 164–176. Springer, Berlin (2007)
Srivastava, A., Kundu, A., Sural, S., Majumdar, A.: Credit card fraud detection using hidden Markov model. IEEE Trans. Dependable Secure Comput. 5(1), 37–48 (2008)
Stamp, M.: Boost your knowledge of AdaBoost (2017). https://www.cs.sjsu.edu/~stamp/ML/files/ada.pdf. Accessed 8 June 2018
Stamp, M.: Introduction to Machine Learning with Applications in Information Security. Chapman and Hall/CRC, Boca Raton (2017)
Statista: Internet of things—number of connected devices worldwide (2018). https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/. Accessed 8 June 2018
Statista: Internet users stats—number of internet users (2018). https://www.statista.com/statistics/273018/number-of-internet-users-worldwide. Accessed 8 June 2018
Vobbilisetty, R., Troia, F.D., Low, R.M., Visaggio, C.A., Stamp, M.: Classic cryptanalysis using hidden Markov models. Cryptologia 41(1), 1–28 (2017)
Zbot (2016). https://www.symantec.com/security-center/writeup/2010-011016-3514-99. Accessed 8 June 2018
Zeroaccess (2013). https://www.symantec.com/security-center/writeup/2010-011016-3514-99. Accessed 8 June 2018
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Raghavan, A., Di Troia, F. & Stamp, M. Hidden Markov models with random restarts versus boosting for malware detection. J Comput Virol Hack Tech 15, 97–107 (2019). https://doi.org/10.1007/s11416-018-0322-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-018-0322-1