Skip to main content
Log in

Hidden Markov models with random restarts versus boosting for malware detection

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

Effective and efficient malware detection is at the forefront of research into building secure digital systems. As with many other fields, malware detection research has seen a dramatic increase in the application of machine learning algorithms. One machine learning technique that has been used widely in the field of pattern matching in general—and malware detection in particular—is hidden Markov models (HMMs). HMM training is based on a hill climb, and hence we can often improve a model by training multiple times with different initial values. In this research, we compare boosted HMMs (using AdaBoost) to HMMs trained with multiple random restarts, in the context of malware detection. These techniques are applied to a variety of challenging malware datasets. We find that random restarts perform surprisingly well in comparison to boosting. Only in the most difficult “cold start” cases (where training data is severely limited) does boosting appear to offer sufficient improvement to justify its higher computational cost in the scoring phase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Accenture: Cost of cyber crime study (2017). https://www.accenture.com/t20170926T072837Z__w__/us-en/_acnmedia/PDF-61/Accenture-2017-CostCyberCrimeStudy.pdf. Accessed 8 June 2018

  2. Annachhatre, C., Austin, T.H., Stamp, M.: Hidden Markov models for malware classification. J. Comput. Virol. Hacking Tech. 11(2), 59–73 (2015)

    Article  Google Scholar 

  3. Ariu, D., Tronci, R., Giacinto, G.: HMMPayl: an intrusion detection system based on hidden Markov models. Comput. Secur. 30(4), 221–241 (2011)

    Article  Google Scholar 

  4. Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Exploring hidden Markov models for virus analysis: a semantic approach. In: Proceedings of the 2013 46th Hawaii International Conference on System Sciences, HICSS ’13, pp. 5039–5048. IEEE Computer Society (2013)

  5. Aycock, J.: Computer viruses and malware. In: Jajodia, S. (ed.) Advances in Information Security. Springer, US (2006)

    Google Scholar 

  6. Bagga, N.: Measuring the effectiveness of generic malware models. Master’s Project, Department of Computer Science, San Jose State University (2017). http://scholarworks.sjsu.edu/etd_projects/566/. Accessed 8 June 2018

  7. Bagga, N., Troia, F.D., Stamp, M.: On the effectiveness of generic malware models. In: Proceedings of the 2018 International Workshop on Behavioral Analysis for System Security, BASS 2018 (2018)

  8. Baysa, D., Low, R.M., Stamp, M.: Structural entropy and metamorphic malware. J. Comput. Virol. Hacking Tech. 9(4), 179–192 (2013)

    Article  Google Scholar 

  9. Berg-Kirkpatrick, T., Klein, D.: Decipherment with a million random restarts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 874–878 (2013)

  10. Bertacchini, M., Fierens, P.: A survey on masquerader detection approaches. In: Proceedings of V Congreso Iberoamericano de Seguridad Informática, Universidad de la República de Uruguay, pp. 46–60 (2009)

  11. Bourlard, H., Kamp, Y., Wellekens, C.: Speaker dependent connected speech recognition via phonetic Markov models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’85, pp. 1213–1216 (1985)

  12. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)

    Article  Google Scholar 

  13. Chen, Y.-S., Chen, Y.-M.: Combining incremental hidden Markov model and AdaBoost algorithm for anomaly intrusion detection. In: Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics, CSI-KDD ’09, pp. 3–9. ACM (2009)

  14. Cho, S.-B., Park, H.-J.: Efficient anomaly detection by modeling privilege flows using hidden Markov model. Comput. Secur. 22(1), 45–55 (2003)

    Article  Google Scholar 

  15. Cridex malware (2017). https://www.computerhope.com/jargon/c/cridex-malware.htm. Accessed 8 June 2018

  16. DuPaul, N.: Common malware types: cybersecurity 101 (2012). https://www.veracode.com/blog/2012/10/common-malware-types-cybersecurity-101. Accessed 8 June 2018

  17. Grabner, H., Bischof, H.: On-line boosting and vision. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR’06, pp. 260–267 (2006)

  18. Harebot.m (2017). http://www.pandasecurity.com/homeusers/security-info/220319/Harebot.M. Accessed 8 June 2018

  19. Hu, J., Yu, X., Qiu, D., Chen, H.-H.: A simple and efficient hidden Markov model scheme for host-based anomaly intrusion detection. IEEE Netw. Mag. Glob. Internetworking 23(1), 42–47 (2009)

    Article  Google Scholar 

  20. Hu, W., Gao, J., Wang, Y., Wu, O., Maybank, S.: Online AdaBoost-based parameterized methods for dynamic distributed network intrusion detection. IEEE Trans. Cybern. 44(1), 66–82 (2014)

    Article  Google Scholar 

  21. International Telecommunications Union (ICT) facts and figures 2017 (2017). https://www.itu.int/en/ITU-D/Statistics/Documents/facts/ICTFactsFigures2017.pdf. Accessed 8 June 2018

  22. Jarng, S.S.: HMM voice recognition algorithm coding. In: 2011 International Conference on Information Science and Applications, pp. 1–7 (2011)

  23. Kalbhor, A., Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Dueling hidden Markov models for virus analysis. J. Comput. Virol. Hacking Tech. 11(2), 103–118 (2015)

    Article  Google Scholar 

  24. Kundu, A., He, Y., Bahl, P.: Recognition of handwritten word: first and second order hidden Markov model based approach. In: Proceedings of Computer Society Conference on Computer Vision and Pattern Recognition, CVPR’88, pp. 457–462 (1988)

  25. Malicia Project (2015). http://malicia-project.com/. Accessed 8 June 2018

  26. Morgan, S.: Top 5 cybersecurity facts, figures and statistics for 2018 (2018). https://www.csoonline.com/article/3153707/security/top-5-cybersecurity-facts-figures-and-statistics.html. Accessed 8 June 2018

  27. Okamoto, T., Ishida, Y.: Framework of an immunity-based anomaly detection system for user behavior. In: International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp. 821–829. Springer (2007)

  28. Okamoto, T., Ishida, Y.: Towards an immunity-based anomaly detection system for network traffic. Int. J. Knowl. Based Intell. Eng. Syst. 15(4), 215–225 (2011)

    Article  Google Scholar 

  29. Panda Security: Cyber Crime Insights (2012). https://www.pandasecurity.com/mediacenter/malware/cyber-crime-insights/. Accessed 8 June 2018

  30. Posadas, R., Mex-Perera, C., Monroy, R., Nolazco-Flores, J.: Hybrid method for detecting masqueraders using session folding and hidden Markov models. In: Proceedings of the 5th Mexican International Conference on Artificial Intelligence, MICAI’06, pp. 622–631. Springer, Berlin (2006)

  31. Rajeswaran, D.: Function call graph score for malware detection. Master’s Project, Department of Computer Science, San Jose State University (2015). http://scholarworks.sjsu.edu/etd_projects/445/. Accessed 8 June 2018

  32. Rajeswaran, D., Troia, F.D., Austin, T.H., Stamp, M.: Function call graphs versus machine learning for malware detection. In: Parkinson, S., Crampton, A., Hill, R. (eds.) Guide to Vulnerability Analysis for Computer Networks and Systems—An Artificial Intelligence Approach, Chapter 11. Springer, Berlin (2018)

    Google Scholar 

  33. Rajeswaran, D., Troia, F.D., Austin, T.H., Stamp, M.: A survey of machine learning algorithms and their application in information security. In: Parkinson, S., Crampton, A., Hill, R. (eds.) Guide to Vulnerability Analysis for Computer Networks and Systems—An Artificial Intelligence Approach, Chapter 2. Springer, Berlin (2018)

    Google Scholar 

  34. Rana, H., Stamp, M.: Hunting for pirated software using metamorphic analysis. Inf. Secur. J. A Glob. Perspect. 23(3), 68–85 (2014)

    Article  Google Scholar 

  35. Rand Corporation: Cyber warfare. https://www.rand.org/topics/cyber-warfare.html. Accessed 8 June 2018

  36. Securityshield (2017). https://www.microsoft.com/en-us/wdsi/threats/malware-encyclopedia-description?Name=SecurityShield. Accessed 8 June 2018

  37. Shanmugam, G., Low, R.M., Stamp, M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hacking Tech. 9(3), 159–170 (2013)

    Article  Google Scholar 

  38. Simova, M., Stamp, M., Pollett, C.: Stealthy ciphertext. In: Arabnia, H.R., Joshua, R. (eds.) Proceedings of the 2005 International Conference on Internet Computing, ICOMP 2005, pp. 380–388. CSREA Press (2005)

  39. Singh, T., Troia, F.D., Visaggio, C.A., Austin, T.H., Stamp, M.: Support vector machines and malware detection. J. Comput. Virol. Hacking Tech. 12(4), 203–212 (2016)

    Article  Google Scholar 

  40. Sperotto, A., Sadre, R., de Boer, P.-T., Pras, A.: Hidden Markov model modeling of SSH brute-force attacks. In: Bartolini, C., Gaspary, L.P. (eds.) Proceedings of 20th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 2009), volume 5841 of Lecture Notes in Computer Science, pp. 164–176. Springer, Berlin (2007)

  41. Srivastava, A., Kundu, A., Sural, S., Majumdar, A.: Credit card fraud detection using hidden Markov model. IEEE Trans. Dependable Secure Comput. 5(1), 37–48 (2008)

    Article  Google Scholar 

  42. Stamp, M.: Boost your knowledge of AdaBoost (2017). https://www.cs.sjsu.edu/~stamp/ML/files/ada.pdf. Accessed 8 June 2018

  43. Stamp, M.: Introduction to Machine Learning with Applications in Information Security. Chapman and Hall/CRC, Boca Raton (2017)

    Book  MATH  Google Scholar 

  44. Statista: Internet of things—number of connected devices worldwide (2018). https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/. Accessed 8 June 2018

  45. Statista: Internet users stats—number of internet users (2018). https://www.statista.com/statistics/273018/number-of-internet-users-worldwide. Accessed 8 June 2018

  46. Vobbilisetty, R., Troia, F.D., Low, R.M., Visaggio, C.A., Stamp, M.: Classic cryptanalysis using hidden Markov models. Cryptologia 41(1), 1–28 (2017)

    Article  Google Scholar 

  47. Zbot (2016). https://www.symantec.com/security-center/writeup/2010-011016-3514-99. Accessed 8 June 2018

  48. Zeroaccess (2013). https://www.symantec.com/security-center/writeup/2010-011016-3514-99. Accessed 8 June 2018

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Stamp.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raghavan, A., Di Troia, F. & Stamp, M. Hidden Markov models with random restarts versus boosting for malware detection. J Comput Virol Hack Tech 15, 97–107 (2019). https://doi.org/10.1007/s11416-018-0322-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-018-0322-1

Keywords

Navigation