Feature Reduction to Speed Up Malware Classification

Moonsamy, Veelasha; Tian, Ronghua; Batten, Lynn

doi:10.1007/978-3-642-29615-4_13

Veelasha Moonsamy¹⁷,
Ronghua Tian¹⁷ &
Lynn Batten¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 7161))

Included in the following conference series:

Nordic Conference on Secure IT Systems

1875 Accesses
8 Citations

Abstract

In statistical classification work, one method of speeding up the process is to use only a small percentage of the total parameter set available. In this paper, we apply this technique both to the classification of malware and the identification of malware from a set combined with cleanware. In order to demonstrate the usefulness of our method, we use the same sets of malware and cleanware as in an earlier paper. Using the statistical technique Information Gain (IG), we reduce the set of features used in the experiment from 7,605 to just over 1,000. The best accuracy obtained in the former paper using 7,605 features is 97.3% for malware versus cleanware detection and 97.4% for malware family classification; on the reduced feature set, we obtain a (best) accuracy of 94.6% on the malware versus cleanware test and 94.5% on the malware classification test. An interesting feature of the new tests presented here is the reduction in false negative rates by a factor of about 1/3 when compared with the results of the earlier paper. In addition, the speed with which our tests run is reduced by a factor of approximately 3/5 from the times posted for the original paper. The small loss in accuracy and improved false negative rate along with significant improvement in speed indicate that feature reduction should be further pursued as a tool to prevent algorithms from becoming intractable due to too much data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahmed, F., Hameed, H., Shafiq, M.Z., Farooq, M.: Using spatio-temporal information in API calls with machine learning algorithms for malware detection. In: AISec 2009: Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence, pp. 55–62 (2009)
Google Scholar
Drozdz, K., Kwasnicka, H.: Feature Set Reduction by Evolutionary Selection and Construction. In: Jędrzejowicz, P., Nguyen, N.T., Howlet, R.J., Jain, L.C. (eds.) KES-AMSTA 2010. LNCS, vol. 6071, pp. 140–149. Springer, Heidelberg (2010)
Chapter Google Scholar
Henchiri, O., Japkowicz, N.: A feature selection and evaluation scheme for computer virus detection. In: Proceedings of the Sixth International Conference on Data Mining, ICDM 2006, pp. 891–895 (2006)
Google Scholar
Komashinskiy, D., Kotenko, I.: Malware detection by data mining techniques based on positionally dependent features. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 617–623 (2010)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research 7, 2721–2744 (2006)
MathSciNet MATH Google Scholar
Li, Y., Lu, B.: Feature selection based on loss-margin of nearest neighbor classification. Pattern Recognition 42(9), 1914–1921 (2009)
Article MATH Google Scholar
Lu, Y., Din, S., Zheng, C., Gao, B.: Using multi-feature and classifier ensembles to improve malware detection. Journal of CCIT 39(2), 57–72 (2010)
Google Scholar
Masud, M.M., Khan, L., Thuraisingham, B.: Feature Based Techniques for Auto-Detection of Novel Email Worms. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 205–216. Springer, Heidelberg (2007)
Chapter Google Scholar
Mehdi, S.B., Tanwani, A.K., Farooq, M.: IMAD: in-execution malware analysis and detection. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO 2009, pp. 1553–1560. ACM (2009)
Google Scholar
Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Elovici, Y.: Unknown malcode detection via text categorization and the imbalance problem. In: 2008 IEEE International Conference on Intelligence and Security Informatics, pp. 156–161 (2008)
Google Scholar
Ranker Search Method, http://weka.sourceforge.net/doc.stable/weka/attributeSelection/Ranker.html
Tian, R., Islam, R., Batten, L., Versteeg, S.: Differentiating malware from cleanware using behavioural analysis. In: Proceedings of the 5rd International Conference on Malicious and Unwanted Software: MALWARE 2010 (2010)
Google Scholar
Wang, T.-Y., Wu, C.-H., Hsieh, C.-C.: A virus prevention model based on static analysis and data mining methods. In: IEEE 8th International Conference on Computer and Information Technology Workshops, CIT Workshops 2008, pp. 288–293 (2008)
Google Scholar
Wang, T., Wu, C., Hsieh, C.: Detecting unknown malicious executables using portable executable headers. In: Fifth International Joint Conference on INC, IMS and IDC, NCM 2009, pp. 278–284. IEEE (2009)
Google Scholar
Waikato Environment for Knowledge Acquisition (WEKA): Data Mining Software in Java. University of Waikato, http://www.cs.waikato.ac.nz/ml/weka
Witten, I., Frank, E., Hall, M.A.: Data mining: Practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)
Google Scholar
Ye, Y., Li, T., Jiang, Q., Wang, Y.: CIMDS: Adapting Postprocessing Techniques of Associative Classification for Malware Detection. IEEE Transactions on Systems, Man, and Cybernetics 40(3), 298–307 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Deakin University, Melbourne, Australia
Veelasha Moonsamy, Ronghua Tian & Lynn Batten

Authors

Veelasha Moonsamy
View author publications
You can also search for this author in PubMed Google Scholar
Ronghua Tian
View author publications
You can also search for this author in PubMed Google Scholar
Lynn Batten
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Cybernetica AS, Ülikooli 2, 51003, Tartu, Estonia
Peeter Laud

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moonsamy, V., Tian, R., Batten, L. (2012). Feature Reduction to Speed Up Malware Classification. In: Laud, P. (eds) Information Security Technology for Applications. NordSec 2011. Lecture Notes in Computer Science, vol 7161. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29615-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-29615-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29614-7
Online ISBN: 978-3-642-29615-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics