Hidden Markov models for malware classification

Annachhatre, Chinmayee; Austin, Thomas H.; Stamp, Mark

doi:10.1007/s11416-014-0215-x

Hidden Markov models for malware classification

Original Paper
Published: 23 May 2014

Volume 11, pages 59–73, (2015)
Cite this article

Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Chinmayee Annachhatre¹,
Thomas H. Austin¹ &
Mark Stamp¹

2003 Accesses
67 Citations
6 Altmetric
Explore all metrics

Abstract

Previous research has shown that hidden Markov model (HMM) analysis is useful for detecting certain challenging classes of malware. In this research, we consider the related problem of malware classification based on HMMs. We train multiple HMMs on a variety of compilers and malware generators. More than 8,000 malware samples are then scored against these models and separated into clusters based on the resulting scores. We observe that the clustering results could be used to classify the malware samples into their appropriate families with good accuracy. Since none of the malware families in the test set were used to generate the HMMs, these results indicate that our approach can effective classify previously unknown malware, at least in some cases. Thus, such a clustering strategy could serve as a useful tool in malware analysis and classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

In a row stochastic matrix, each row defines a probability distribution. That is, each element is in the range of 0 to 1, and the elements of any row must sum to 1.

References

Annachhatre, C.: Hidden Markov models for malware classification. Department of Computer Science, San Jose State University, Master’s report (2013)
Attaluri, S., McGhee, S., Stamp, M.: Profile hidden Markov models and metamorphic virus detection. J. Comput. Virol. 5(2), 151–169 (2009)
Article Google Scholar
Austin, T., Filiol, E., Josse, S., Stamp, M.: Exploring hidden Markov models for virus analysis: a semantic approach. In: 46th Hawaii International Conference on System Sciences (HICSS 46), pp. 5039–5048 (2013)
Baysa, D., Low, R.M., Stamp, M.: Structural entropy and metamorphic malware. J. Comput. Virol. Hacking Tech. 9(4), 179–192 (2013)
Article Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997)
Canzanese, R., Kam, M., Mancoridis, S.: Toward an automatic, online behavioral malware classification system. https://www.cs.drexel.edu/~spiros/papers/saso2013.pdf (2013)
Cesare, S., Xiang, Y.: Classification of Malware using structured control flow. In: 8th Australasian Symposium on Parallel and Distributed Computing, vol. 107, pp. 61–70 (2010)
Do, C.B., Batzoglou, S.: What is the expectation maximization algorithm? Nat. Biotechnol. 26(8), 897–899. http://ai.stanford.edu/~chuongdo/papers/em_tutorial.pdf (2008)
Indika: Difference between hierarchical and partitional clustering. http://www.differencebetween.com/difference-between-hierarchical-and-vs-partitional-clustering (2011)
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Jin, R.: Cluster validation. http://www.cs.kent.edu/~jin/DM08/ClusterValidation.pdf (2008)
Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Article Google Scholar
Kolter, S., Maloof, M.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)
MathSciNet MATH Google Scholar
Krogh, A.: An Introduction to Hidden Markov Models for Biological Sequences. Computational Methods in Molecular Biology. Elsevier, Lyngby (1998)
MATH Google Scholar
Krogh, A., et al.: Hidden Markov models in computational biology: applications to protein modeling. J. Mol. Biol. 235(5), 1501–1531 (1994)
Article Google Scholar
Lakhotia, A., Walenstein, A., Miles, C., Singh, A.: VILO: a rapid learning nearest-neighbor classifier for malware triage. J. Comput. Virol. 9(3), 109–123 (2013)
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Maloof, M.A.: Machine Learning and Data Mining for Computer Security: Methods and Applications. Springer, Berlin (2006)
Book Google Scholar
Ming, X., et al.: A similarity metric method of obfuscated malware using function-call graph. J. Comput. Virol. Hacking Tech. 9(1), 35–47 (2013)
Article Google Scholar
MITRE: Malware attribute enumeration and characterization. http://maec.mitre.org (2013)
Moore, A.W.: \(K\)-Means and hierarchical clustering. http://www.autonlab.org/tutorials/kmeans11.pdf (2001)
Nappa, A., Zubair Rafique, M., Caballero, J.: Driving in the cloud: an analysis of drive-by download operations and abuse reporting of viruses. In: Proceedings of the 10th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (2013)
Park, Y., Reeves, D.S., Stamp, M.: Deriving common malware behavior through graph clustering. Comput. Secur. 39(B), 419–430 (2013)
Article Google Scholar
Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Runwal, N., Low, R., Stamp, M.: Opcode graph similarity and metamorphic detection. J. Comput. Virol. 8, 37–52 (2012)
Article Google Scholar
Saleh, M., Mohamed, A., Nabi, A.: Eigenviruses for metamorphic virus recognition. IET Inf. Secur. 5(4), 191–198 (2011)
Article Google Scholar
Skulason, F., Solomon, A., Bontchev, V.: CARO naming scheme. http://www.caro.org/naming/scheme.html (1991)
Sorokin, I.: Comparing files using structural entropy. J. Comput. Virol. 7(4), 259–265 (2011)
Article MathSciNet Google Scholar
Sridhara, S.M., Stamp, M.: Metamorphic worm that carries its own morphing engine. J. Comput. Virol. Hacking Tech. 9(2), 49–58 (2013)
Article Google Scholar
Stamp, M.: A revealing introduction to hidden Markov models. http://www.cs.sjsu.edu/faculty/stamp/RUA/HMM.pdf (2012)
Swimmer, M.: Response to the proposal for a “C virus” database. ACM SIGSAC Review, vol. 8, pp. 1–5. http://www.odysci.com/article/1010112993890087 (1990)
Symantec: Trojan.Zbot. http://www.symantec.com/security_response/writeup.jsp?docid=2010-011016-3514-99 (2010)
Symantec Security Response: Trojan.Zeroaccess. http://www.symantec.com/security_response/writeup.jsp?docid=2011-071314-0410-99 (2011)
Virus Removal Services: Beware of FAKE antivirus—Winwebsec. http://virus.myfirstattempt.com/2012/11/beware-of-fake-anti-virus-winwebsec.html (2012)
VX Heavens. http://vx.netlux.org/ (2013)
Wong, W., Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, San Jose State University, San Jose, USA
Chinmayee Annachhatre, Thomas H. Austin & Mark Stamp

Authors

Chinmayee Annachhatre
View author publications
You can also search for this author in PubMed Google Scholar
Thomas H. Austin
View author publications
You can also search for this author in PubMed Google Scholar
Mark Stamp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark Stamp.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Annachhatre, C., Austin, T.H. & Stamp, M. Hidden Markov models for malware classification. J Comput Virol Hack Tech 11, 59–73 (2015). https://doi.org/10.1007/s11416-014-0215-x

Download citation

Received: 21 December 2013
Accepted: 05 May 2014
Published: 23 May 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s11416-014-0215-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hidden Markov models for malware classification

Abstract

Access this article

Similar content being viewed by others

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Data clustering: application and trends

Introduction to Machine Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hidden Markov models for malware classification

Abstract

Access this article

Similar content being viewed by others

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Data clustering: application and trends

Introduction to Machine Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation