Hidden Markov models with random restarts versus boosting for malware detection

Raghavan, Aditya; Di Troia, Fabio; Stamp, Mark

doi:10.1007/s11416-018-0322-1

Hidden Markov models with random restarts versus boosting for malware detection

Original Paper
Published: 28 August 2018

Volume 15, pages 97–107, (2019)
Cite this article

Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Aditya Raghavan¹,
Fabio Di Troia¹ &
Mark Stamp¹

410 Accesses
16 Citations
Explore all metrics

Abstract

Effective and efficient malware detection is at the forefront of research into building secure digital systems. As with many other fields, malware detection research has seen a dramatic increase in the application of machine learning algorithms. One machine learning technique that has been used widely in the field of pattern matching in general—and malware detection in particular—is hidden Markov models (HMMs). HMM training is based on a hill climb, and hence we can often improve a model by training multiple times with different initial values. In this research, we compare boosted HMMs (using AdaBoost) to HMMs trained with multiple random restarts, in the context of malware detection. These techniques are applied to a variety of challenging malware datasets. We find that random restarts perform surprisingly well in comparison to boosting. Only in the most difficult “cold start” cases (where training data is severely limited) does boosting appear to offer sufficient improvement to justify its higher computational cost in the scoring phase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

A survey on semi-supervised learning

Article Open access 15 November 2019

References

Accenture: Cost of cyber crime study (2017). https://www.accenture.com/t20170926T072837Z__w__/us-en/_acnmedia/PDF-61/Accenture-2017-CostCyberCrimeStudy.pdf. Accessed 8 June 2018
Annachhatre, C., Austin, T.H., Stamp, M.: Hidden Markov models for malware classification. J. Comput. Virol. Hacking Tech. 11(2), 59–73 (2015)
Article Google Scholar
Ariu, D., Tronci, R., Giacinto, G.: HMMPayl: an intrusion detection system based on hidden Markov models. Comput. Secur. 30(4), 221–241 (2011)
Article Google Scholar
Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Exploring hidden Markov models for virus analysis: a semantic approach. In: Proceedings of the 2013 46th Hawaii International Conference on System Sciences, HICSS ’13, pp. 5039–5048. IEEE Computer Society (2013)
Aycock, J.: Computer viruses and malware. In: Jajodia, S. (ed.) Advances in Information Security. Springer, US (2006)
Google Scholar
Bagga, N.: Measuring the effectiveness of generic malware models. Master’s Project, Department of Computer Science, San Jose State University (2017). http://scholarworks.sjsu.edu/etd_projects/566/. Accessed 8 June 2018
Bagga, N., Troia, F.D., Stamp, M.: On the effectiveness of generic malware models. In: Proceedings of the 2018 International Workshop on Behavioral Analysis for System Security, BASS 2018 (2018)
Baysa, D., Low, R.M., Stamp, M.: Structural entropy and metamorphic malware. J. Comput. Virol. Hacking Tech. 9(4), 179–192 (2013)
Article Google Scholar
Berg-Kirkpatrick, T., Klein, D.: Decipherment with a million random restarts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 874–878 (2013)
Bertacchini, M., Fierens, P.: A survey on masquerader detection approaches. In: Proceedings of V Congreso Iberoamericano de Seguridad Informática, Universidad de la República de Uruguay, pp. 46–60 (2009)
Bourlard, H., Kamp, Y., Wellekens, C.: Speaker dependent connected speech recognition via phonetic Markov models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’85, pp. 1213–1216 (1985)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)
Article Google Scholar
Chen, Y.-S., Chen, Y.-M.: Combining incremental hidden Markov model and AdaBoost algorithm for anomaly intrusion detection. In: Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics, CSI-KDD ’09, pp. 3–9. ACM (2009)
Cho, S.-B., Park, H.-J.: Efficient anomaly detection by modeling privilege flows using hidden Markov model. Comput. Secur. 22(1), 45–55 (2003)
Article Google Scholar
Cridex malware (2017). https://www.computerhope.com/jargon/c/cridex-malware.htm. Accessed 8 June 2018
DuPaul, N.: Common malware types: cybersecurity 101 (2012). https://www.veracode.com/blog/2012/10/common-malware-types-cybersecurity-101. Accessed 8 June 2018
Grabner, H., Bischof, H.: On-line boosting and vision. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR’06, pp. 260–267 (2006)
Harebot.m (2017). http://www.pandasecurity.com/homeusers/security-info/220319/Harebot.M. Accessed 8 June 2018
Hu, J., Yu, X., Qiu, D., Chen, H.-H.: A simple and efficient hidden Markov model scheme for host-based anomaly intrusion detection. IEEE Netw. Mag. Glob. Internetworking 23(1), 42–47 (2009)
Article Google Scholar
Hu, W., Gao, J., Wang, Y., Wu, O., Maybank, S.: Online AdaBoost-based parameterized methods for dynamic distributed network intrusion detection. IEEE Trans. Cybern. 44(1), 66–82 (2014)
Article Google Scholar
International Telecommunications Union (ICT) facts and figures 2017 (2017). https://www.itu.int/en/ITU-D/Statistics/Documents/facts/ICTFactsFigures2017.pdf. Accessed 8 June 2018
Jarng, S.S.: HMM voice recognition algorithm coding. In: 2011 International Conference on Information Science and Applications, pp. 1–7 (2011)
Kalbhor, A., Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Dueling hidden Markov models for virus analysis. J. Comput. Virol. Hacking Tech. 11(2), 103–118 (2015)
Article Google Scholar
Kundu, A., He, Y., Bahl, P.: Recognition of handwritten word: first and second order hidden Markov model based approach. In: Proceedings of Computer Society Conference on Computer Vision and Pattern Recognition, CVPR’88, pp. 457–462 (1988)
Malicia Project (2015). http://malicia-project.com/. Accessed 8 June 2018
Morgan, S.: Top 5 cybersecurity facts, figures and statistics for 2018 (2018). https://www.csoonline.com/article/3153707/security/top-5-cybersecurity-facts-figures-and-statistics.html. Accessed 8 June 2018
Okamoto, T., Ishida, Y.: Framework of an immunity-based anomaly detection system for user behavior. In: International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp. 821–829. Springer (2007)
Okamoto, T., Ishida, Y.: Towards an immunity-based anomaly detection system for network traffic. Int. J. Knowl. Based Intell. Eng. Syst. 15(4), 215–225 (2011)
Article Google Scholar
Panda Security: Cyber Crime Insights (2012). https://www.pandasecurity.com/mediacenter/malware/cyber-crime-insights/. Accessed 8 June 2018
Posadas, R., Mex-Perera, C., Monroy, R., Nolazco-Flores, J.: Hybrid method for detecting masqueraders using session folding and hidden Markov models. In: Proceedings of the 5th Mexican International Conference on Artificial Intelligence, MICAI’06, pp. 622–631. Springer, Berlin (2006)
Rajeswaran, D.: Function call graph score for malware detection. Master’s Project, Department of Computer Science, San Jose State University (2015). http://scholarworks.sjsu.edu/etd_projects/445/. Accessed 8 June 2018
Rajeswaran, D., Troia, F.D., Austin, T.H., Stamp, M.: Function call graphs versus machine learning for malware detection. In: Parkinson, S., Crampton, A., Hill, R. (eds.) Guide to Vulnerability Analysis for Computer Networks and Systems—An Artificial Intelligence Approach, Chapter 11. Springer, Berlin (2018)
Google Scholar
Rajeswaran, D., Troia, F.D., Austin, T.H., Stamp, M.: A survey of machine learning algorithms and their application in information security. In: Parkinson, S., Crampton, A., Hill, R. (eds.) Guide to Vulnerability Analysis for Computer Networks and Systems—An Artificial Intelligence Approach, Chapter 2. Springer, Berlin (2018)
Google Scholar
Rana, H., Stamp, M.: Hunting for pirated software using metamorphic analysis. Inf. Secur. J. A Glob. Perspect. 23(3), 68–85 (2014)
Article Google Scholar
Rand Corporation: Cyber warfare. https://www.rand.org/topics/cyber-warfare.html. Accessed 8 June 2018
Securityshield (2017). https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-description?Name=SecurityShield. Accessed 8 June 2018
Shanmugam, G., Low, R.M., Stamp, M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hacking Tech. 9(3), 159–170 (2013)
Article Google Scholar
Simova, M., Stamp, M., Pollett, C.: Stealthy ciphertext. In: Arabnia, H.R., Joshua, R. (eds.) Proceedings of the 2005 International Conference on Internet Computing, ICOMP 2005, pp. 380–388. CSREA Press (2005)
Singh, T., Troia, F.D., Visaggio, C.A., Austin, T.H., Stamp, M.: Support vector machines and malware detection. J. Comput. Virol. Hacking Tech. 12(4), 203–212 (2016)
Article Google Scholar
Sperotto, A., Sadre, R., de Boer, P.-T., Pras, A.: Hidden Markov model modeling of SSH brute-force attacks. In: Bartolini, C., Gaspary, L.P. (eds.) Proceedings of 20th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 2009), volume 5841 of Lecture Notes in Computer Science, pp. 164–176. Springer, Berlin (2007)
Srivastava, A., Kundu, A., Sural, S., Majumdar, A.: Credit card fraud detection using hidden Markov model. IEEE Trans. Dependable Secure Comput. 5(1), 37–48 (2008)
Article Google Scholar
Stamp, M.: Boost your knowledge of AdaBoost (2017). https://www.cs.sjsu.edu/~stamp/ML/files/ada.pdf. Accessed 8 June 2018
Stamp, M.: Introduction to Machine Learning with Applications in Information Security. Chapman and Hall/CRC, Boca Raton (2017)
Book MATH Google Scholar
Statista: Internet of things—number of connected devices worldwide (2018). https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/. Accessed 8 June 2018
Statista: Internet users stats—number of internet users (2018). https://www.statista.com/statistics/273018/number-of-internet-users-worldwide. Accessed 8 June 2018
Vobbilisetty, R., Troia, F.D., Low, R.M., Visaggio, C.A., Stamp, M.: Classic cryptanalysis using hidden Markov models. Cryptologia 41(1), 1–28 (2017)
Article Google Scholar
Zbot (2016). https://www.symantec.com/security-center/writeup/2010-011016-3514-99. Accessed 8 June 2018
Zeroaccess (2013). https://www.symantec.com/security-center/writeup/2010-011016-3514-99. Accessed 8 June 2018

Download references

Author information

Authors and Affiliations

Department of Computer Science, San Jose State University, San Jose, USA
Aditya Raghavan, Fabio Di Troia & Mark Stamp

Authors

Aditya Raghavan
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Di Troia
View author publications
You can also search for this author in PubMed Google Scholar
Mark Stamp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark Stamp.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raghavan, A., Di Troia, F. & Stamp, M. Hidden Markov models with random restarts versus boosting for malware detection. J Comput Virol Hack Tech 15, 97–107 (2019). https://doi.org/10.1007/s11416-018-0322-1

Download citation

Received: 08 June 2018
Accepted: 06 August 2018
Published: 28 August 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s11416-018-0322-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hidden Markov models with random restarts versus boosting for malware detection

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

A comparative analysis of gradient boosting algorithms

A survey on semi-supervised learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hidden Markov models with random restarts versus boosting for malware detection

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

A comparative analysis of gradient boosting algorithms

A survey on semi-supervised learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation