Advertisement

Hidden Markov models with random restarts versus boosting for malware detection

  • Aditya Raghavan
  • Fabio Di Troia
  • Mark StampEmail author
Original Paper
  • 83 Downloads

Abstract

Effective and efficient malware detection is at the forefront of research into building secure digital systems. As with many other fields, malware detection research has seen a dramatic increase in the application of machine learning algorithms. One machine learning technique that has been used widely in the field of pattern matching in general—and malware detection in particular—is hidden Markov models (HMMs). HMM training is based on a hill climb, and hence we can often improve a model by training multiple times with different initial values. In this research, we compare boosted HMMs (using AdaBoost) to HMMs trained with multiple random restarts, in the context of malware detection. These techniques are applied to a variety of challenging malware datasets. We find that random restarts perform surprisingly well in comparison to boosting. Only in the most difficult “cold start” cases (where training data is severely limited) does boosting appear to offer sufficient improvement to justify its higher computational cost in the scoring phase.

References

  1. 1.
  2. 2.
    Annachhatre, C., Austin, T.H., Stamp, M.: Hidden Markov models for malware classification. J. Comput. Virol. Hacking Tech. 11(2), 59–73 (2015)CrossRefGoogle Scholar
  3. 3.
    Ariu, D., Tronci, R., Giacinto, G.: HMMPayl: an intrusion detection system based on hidden Markov models. Comput. Secur. 30(4), 221–241 (2011)CrossRefGoogle Scholar
  4. 4.
    Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Exploring hidden Markov models for virus analysis: a semantic approach. In: Proceedings of the 2013 46th Hawaii International Conference on System Sciences, HICSS ’13, pp. 5039–5048. IEEE Computer Society (2013)Google Scholar
  5. 5.
    Aycock, J.: Computer viruses and malware. In: Jajodia, S. (ed.) Advances in Information Security. Springer, US (2006)Google Scholar
  6. 6.
    Bagga, N.: Measuring the effectiveness of generic malware models. Master’s Project, Department of Computer Science, San Jose State University (2017). http://scholarworks.sjsu.edu/etd_projects/566/. Accessed 8 June 2018
  7. 7.
    Bagga, N., Troia, F.D., Stamp, M.: On the effectiveness of generic malware models. In: Proceedings of the 2018 International Workshop on Behavioral Analysis for System Security, BASS 2018 (2018)Google Scholar
  8. 8.
    Baysa, D., Low, R.M., Stamp, M.: Structural entropy and metamorphic malware. J. Comput. Virol. Hacking Tech. 9(4), 179–192 (2013)CrossRefGoogle Scholar
  9. 9.
    Berg-Kirkpatrick, T., Klein, D.: Decipherment with a million random restarts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 874–878 (2013)Google Scholar
  10. 10.
    Bertacchini, M., Fierens, P.: A survey on masquerader detection approaches. In: Proceedings of V Congreso Iberoamericano de Seguridad Informática, Universidad de la República de Uruguay, pp. 46–60 (2009)Google Scholar
  11. 11.
    Bourlard, H., Kamp, Y., Wellekens, C.: Speaker dependent connected speech recognition via phonetic Markov models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’85, pp. 1213–1216 (1985)Google Scholar
  12. 12.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)CrossRefGoogle Scholar
  13. 13.
    Chen, Y.-S., Chen, Y.-M.: Combining incremental hidden Markov model and AdaBoost algorithm for anomaly intrusion detection. In: Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics, CSI-KDD ’09, pp. 3–9. ACM (2009)Google Scholar
  14. 14.
    Cho, S.-B., Park, H.-J.: Efficient anomaly detection by modeling privilege flows using hidden Markov model. Comput. Secur. 22(1), 45–55 (2003)CrossRefGoogle Scholar
  15. 15.
    Cridex malware (2017). https://www.computerhope.com/jargon/c/cridex-malware.htm. Accessed 8 June 2018
  16. 16.
    DuPaul, N.: Common malware types: cybersecurity 101 (2012). https://www.veracode.com/blog/2012/10/common-malware-types-cybersecurity-101. Accessed 8 June 2018
  17. 17.
    Grabner, H., Bischof, H.: On-line boosting and vision. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR’06, pp. 260–267 (2006)Google Scholar
  18. 18.
  19. 19.
    Hu, J., Yu, X., Qiu, D., Chen, H.-H.: A simple and efficient hidden Markov model scheme for host-based anomaly intrusion detection. IEEE Netw. Mag. Glob. Internetworking 23(1), 42–47 (2009)CrossRefGoogle Scholar
  20. 20.
    Hu, W., Gao, J., Wang, Y., Wu, O., Maybank, S.: Online AdaBoost-based parameterized methods for dynamic distributed network intrusion detection. IEEE Trans. Cybern. 44(1), 66–82 (2014)CrossRefGoogle Scholar
  21. 21.
    International Telecommunications Union (ICT) facts and figures 2017 (2017). https://www.itu.int/en/ITU-D/Statistics/Documents/facts/ICTFactsFigures2017.pdf. Accessed 8 June 2018
  22. 22.
    Jarng, S.S.: HMM voice recognition algorithm coding. In: 2011 International Conference on Information Science and Applications, pp. 1–7 (2011)Google Scholar
  23. 23.
    Kalbhor, A., Austin, T.H., Filiol, E., Josse, S., Stamp, M.: Dueling hidden Markov models for virus analysis. J. Comput. Virol. Hacking Tech. 11(2), 103–118 (2015)CrossRefGoogle Scholar
  24. 24.
    Kundu, A., He, Y., Bahl, P.: Recognition of handwritten word: first and second order hidden Markov model based approach. In: Proceedings of Computer Society Conference on Computer Vision and Pattern Recognition, CVPR’88, pp. 457–462 (1988)Google Scholar
  25. 25.
    Malicia Project (2015). http://malicia-project.com/. Accessed 8 June 2018
  26. 26.
    Morgan, S.: Top 5 cybersecurity facts, figures and statistics for 2018 (2018). https://www.csoonline.com/article/3153707/security/top-5-cybersecurity-facts-figures-and-statistics.html. Accessed 8 June 2018
  27. 27.
    Okamoto, T., Ishida, Y.: Framework of an immunity-based anomaly detection system for user behavior. In: International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp. 821–829. Springer (2007)Google Scholar
  28. 28.
    Okamoto, T., Ishida, Y.: Towards an immunity-based anomaly detection system for network traffic. Int. J. Knowl. Based Intell. Eng. Syst. 15(4), 215–225 (2011)CrossRefGoogle Scholar
  29. 29.
    Panda Security: Cyber Crime Insights (2012). https://www.pandasecurity.com/mediacenter/malware/cyber-crime-insights/. Accessed 8 June 2018
  30. 30.
    Posadas, R., Mex-Perera, C., Monroy, R., Nolazco-Flores, J.: Hybrid method for detecting masqueraders using session folding and hidden Markov models. In: Proceedings of the 5th Mexican International Conference on Artificial Intelligence, MICAI’06, pp. 622–631. Springer, Berlin (2006)Google Scholar
  31. 31.
    Rajeswaran, D.: Function call graph score for malware detection. Master’s Project, Department of Computer Science, San Jose State University (2015). http://scholarworks.sjsu.edu/etd_projects/445/. Accessed 8 June 2018
  32. 32.
    Rajeswaran, D., Troia, F.D., Austin, T.H., Stamp, M.: Function call graphs versus machine learning for malware detection. In: Parkinson, S., Crampton, A., Hill, R. (eds.) Guide to Vulnerability Analysis for Computer Networks and Systems—An Artificial Intelligence Approach, Chapter 11. Springer, Berlin (2018)Google Scholar
  33. 33.
    Rajeswaran, D., Troia, F.D., Austin, T.H., Stamp, M.: A survey of machine learning algorithms and their application in information security. In: Parkinson, S., Crampton, A., Hill, R. (eds.) Guide to Vulnerability Analysis for Computer Networks and Systems—An Artificial Intelligence Approach, Chapter 2. Springer, Berlin (2018)Google Scholar
  34. 34.
    Rana, H., Stamp, M.: Hunting for pirated software using metamorphic analysis. Inf. Secur. J. A Glob. Perspect. 23(3), 68–85 (2014)CrossRefGoogle Scholar
  35. 35.
    Rand Corporation: Cyber warfare. https://www.rand.org/topics/cyber-warfare.html. Accessed 8 June 2018
  36. 36.
  37. 37.
    Shanmugam, G., Low, R.M., Stamp, M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hacking Tech. 9(3), 159–170 (2013)CrossRefGoogle Scholar
  38. 38.
    Simova, M., Stamp, M., Pollett, C.: Stealthy ciphertext. In: Arabnia, H.R., Joshua, R. (eds.) Proceedings of the 2005 International Conference on Internet Computing, ICOMP 2005, pp. 380–388. CSREA Press (2005)Google Scholar
  39. 39.
    Singh, T., Troia, F.D., Visaggio, C.A., Austin, T.H., Stamp, M.: Support vector machines and malware detection. J. Comput. Virol. Hacking Tech. 12(4), 203–212 (2016)CrossRefGoogle Scholar
  40. 40.
    Sperotto, A., Sadre, R., de Boer, P.-T., Pras, A.: Hidden Markov model modeling of SSH brute-force attacks. In: Bartolini, C., Gaspary, L.P. (eds.) Proceedings of 20th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 2009), volume 5841 of Lecture Notes in Computer Science, pp. 164–176. Springer, Berlin (2007)Google Scholar
  41. 41.
    Srivastava, A., Kundu, A., Sural, S., Majumdar, A.: Credit card fraud detection using hidden Markov model. IEEE Trans. Dependable Secure Comput. 5(1), 37–48 (2008)CrossRefGoogle Scholar
  42. 42.
    Stamp, M.: Boost your knowledge of AdaBoost (2017). https://www.cs.sjsu.edu/~stamp/ML/files/ada.pdf. Accessed 8 June 2018
  43. 43.
    Stamp, M.: Introduction to Machine Learning with Applications in Information Security. Chapman and Hall/CRC, Boca Raton (2017)CrossRefzbMATHGoogle Scholar
  44. 44.
    Statista: Internet of things—number of connected devices worldwide (2018). https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/. Accessed 8 June 2018
  45. 45.
    Statista: Internet users stats—number of internet users (2018). https://www.statista.com/statistics/273018/number-of-internet-users-worldwide. Accessed 8 June 2018
  46. 46.
    Vobbilisetty, R., Troia, F.D., Low, R.M., Visaggio, C.A., Stamp, M.: Classic cryptanalysis using hidden Markov models. Cryptologia 41(1), 1–28 (2017)CrossRefGoogle Scholar
  47. 47.
  48. 48.

Copyright information

© Springer-Verlag France SAS, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceSan Jose State UniversitySan JoseUSA

Personalised recommendations