Exploring Discriminatory Features for Automated Malware Classification

Yan, Guanhua; Brown, Nathan; Kong, Deguang

doi:10.1007/978-3-642-39235-1_3

Guanhua Yan¹⁸,
Nathan Brown¹⁹ &
Deguang Kong²⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 7967))

Included in the following conference series:

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

2668 Accesses
40 Citations

Abstract

The ever-growing malware threat in the cyber space calls for techniques that are more effective than widely deployed signature-based detection systems and more scalable than manual reverse engineering by forensic experts. To counter large volumes of malware variants, machine learning techniques have been applied recently for automated malware classification. Despite the successes made from these efforts, we still lack a basic understanding of some key issues, such as what features we should use and which classifiers perform well on malware data. Against this backdrop, the goal of this work is to explore discriminatory features for automated malware classification. We conduct a systematic study on the discriminative power of various types of features extracted from malware programs, and experiment with different combinations of feature selection algorithms and classifiers. Our results not only offer insights into what features most distinguish malware families, but also shed light on how to develop scalable techniques for automated malware classification in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, B., Quist, D., Neil, J., Storlie, C., Lane, T.: Graph-based malware detection using dynamic analysis. Journal of Computer Virology 7(4), 247–258 (2011)
Article Google Scholar
http://anubis.iseclab.org/
Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007)
Chapter Google Scholar
Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: NDSS 2009 (2009)
Google Scholar
http://www.sophos.com/en-us/threat-center/threat-analyses/viruses-and-spyware/Troj~Bifrose-ZI/detailed-analysis.aspx
Canali, D., Lanzi, A., Balzarotti, D., Christoderescu, M., Kruegel, C., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: ISSTA (2012)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21 (2009)
Google Scholar
Hu, X., Chiueh, T.-C., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: CCS 2009 (2009)
Google Scholar
http://www.pintool.org/
Jang, J., Brumley, D., Venkataraman, S.: Bitshred: feature hashing malware for scalable triage and semantic analysis. In: Proceedings of ACM CCS 2011 (2011)
Google Scholar
Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X., Wang, X.: Effective and efficient malware detection at the end host. In: USENIX Security 2009 (2009)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. Journal of Maching Learning Research 7, 2721–2744 (2006)
MathSciNet MATH Google Scholar
Kong, D., Ding, C., Huang, H., Zhao, H.: Multi-label relieff and f-statistic feature selections for image annotation. In: IEEE CVPR 2012 (2012)
Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Chapter Google Scholar
Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006)
Chapter Google Scholar
Li, Y.: Building a Decision Cluster Classification Model by a Clustering Algorithm to Classify Large High Dimensional Data with Multiple Classes. PhD thesis, The Hong Kong Polytechnic University (2010)
Google Scholar
http://code.google.com/p/libdasm/
Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13 (2002)
Google Scholar
Maggi, F., Bellini, A., Salvaneschi, G., Zanero, S.: Finding non-trivial malware naming inconsistencies. In: Jajodia, S., Mazumdar, C. (eds.) ICISS 2011. LNCS, vol. 7093, pp. 144–159. Springer, Heidelberg (2011)
Chapter Google Scholar
Microsoft security intelligence report (January-June 2006)
Google Scholar
Nataraj, L., Yegneswaran, V., Porras, P., Zhang, J.: A comparative assessment of malware classification using binary texture analysis and dynamic analysis. In: ACM AISec 2011 (2011)
Google Scholar
http://www.offensivecomputing.net/ (accessed in March 2012)
http://orange.biolab.si/
http://code.google.com/p/pefile/
Perdisci, R., Lanzi, A., Lee, W.: Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executables. In: ACSAC 2008 (2008)
Google Scholar
Raman, K.: Selecting features to classify malware. In: Proc. of InfoSec Southwest (2012)
Google Scholar
Rieck, K., Krueger, T., Dewald, A.: Cujo: efficient detection and prevention of drive-by-download attacks. In: ACSAC 2010 (2010)
Google Scholar
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)
Google Scholar
Rossow, C., Dietrich, C.J., Grier, C., Kreibich, C., Paxson, V., Pohlmann, N., Bos, H., van Steen, M.: Prudent practices for designing malware experiments: Status quo and outlook. In: IEEE Symposium on Security and Privacy (May 2012)
Google Scholar
Roth, V., Lange, T.: Feature selection in clustering problems. In: NIPS 2004. MIT Press, Cambridge (2004)
Google Scholar
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proc. of IEEE Symposium on Security and Privacy (2001)
Google Scholar
http://scikit-learn.org/
http://www.honeynet.org/node/53
Shafiq, M.Z., Tabish, S.M., Mirza, F., Farooq, M.: PE-Miner: Mining structural information to detect malicious executables in realtime. In: Kirda, E., Jha, S., Balzarotti, D. (eds.) RAID 2009. LNCS, vol. 5758, pp. 121–141. Springer, Heidelberg (2009)
Chapter Google Scholar
http://www.symantec.com/about/news/release/article.jsp?prid=20110404_03
https://www.virustotal.com/
Yang, C., Harkreader, R.C., Gu, G.: Die free or live hard? Empirical evaluation and new design for fighting evolving twitter spammers. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 318–337. Springer, Heidelberg (2011)
Chapter Google Scholar
Ye, Y., Wang, D., Li, T., Ye, D., Jiang, Q.: An intelligent pe-malware detection system based on association mining. Journal in Computer Virology (2008)
Google Scholar
Yu, H.-F., Huang, F.-L., Lin, C.-J.: Dual coordinate descent methods for logistic regression and maximum entropy models. Machine Learning 85(1-2), 41–75 (2011)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Information Sciences (CCS-3), Los Alamos National Laboratory, USA
Guanhua Yan
Department of Electrical and Computer Engineering, Naval Postgraduate School, USA
Nathan Brown
Department of Computer Science, University of Texas, Arlington, USA
Deguang Kong

Authors

Guanhua Yan
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Brown
View author publications
You can also search for this author in PubMed Google Scholar
Deguang Kong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Computer Security Group, University of Göttingen, Goldschmidtstr. 7, 37077, Göttingen, Germany
Konrad Rieck
Telekom Innovation Laboratories, Security in Telecommunications, Technische Universität Berlin, Ernst-Reuter-Platz 7, 10587, Berlin, Germany
Patrick Stewin & Jean-Pierre Seifert &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yan, G., Brown, N., Kong, D. (2013). Exploring Discriminatory Features for Automated Malware Classification. In: Rieck, K., Stewin, P., Seifert, JP. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2013. Lecture Notes in Computer Science, vol 7967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39235-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-39235-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39234-4
Online ISBN: 978-3-642-39235-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics