Skip to main content

Feature Reduction to Speed Up Malware Classification

  • Conference paper
Information Security Technology for Applications (NordSec 2011)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 7161))

Included in the following conference series:

Abstract

In statistical classification work, one method of speeding up the process is to use only a small percentage of the total parameter set available. In this paper, we apply this technique both to the classification of malware and the identification of malware from a set combined with cleanware. In order to demonstrate the usefulness of our method, we use the same sets of malware and cleanware as in an earlier paper. Using the statistical technique Information Gain (IG), we reduce the set of features used in the experiment from 7,605 to just over 1,000. The best accuracy obtained in the former paper using 7,605 features is 97.3% for malware versus cleanware detection and 97.4% for malware family classification; on the reduced feature set, we obtain a (best) accuracy of 94.6% on the malware versus cleanware test and 94.5% on the malware classification test. An interesting feature of the new tests presented here is the reduction in false negative rates by a factor of about 1/3 when compared with the results of the earlier paper. In addition, the speed with which our tests run is reduced by a factor of approximately 3/5 from the times posted for the original paper. The small loss in accuracy and improved false negative rate along with significant improvement in speed indicate that feature reduction should be further pursued as a tool to prevent algorithms from becoming intractable due to too much data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahmed, F., Hameed, H., Shafiq, M.Z., Farooq, M.: Using spatio-temporal information in API calls with machine learning algorithms for malware detection. In: AISec 2009: Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence, pp. 55–62 (2009)

    Google Scholar 

  2. Drozdz, K., Kwasnicka, H.: Feature Set Reduction by Evolutionary Selection and Construction. In: Jędrzejowicz, P., Nguyen, N.T., Howlet, R.J., Jain, L.C. (eds.) KES-AMSTA 2010. LNCS, vol. 6071, pp. 140–149. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Henchiri, O., Japkowicz, N.: A feature selection and evaluation scheme for computer virus detection. In: Proceedings of the Sixth International Conference on Data Mining, ICDM 2006, pp. 891–895 (2006)

    Google Scholar 

  4. Komashinskiy, D., Kotenko, I.: Malware detection by data mining techniques based on positionally dependent features. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 617–623 (2010)

    Google Scholar 

  5. Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research 7, 2721–2744 (2006)

    MathSciNet  MATH  Google Scholar 

  6. Li, Y., Lu, B.: Feature selection based on loss-margin of nearest neighbor classification. Pattern Recognition 42(9), 1914–1921 (2009)

    Article  MATH  Google Scholar 

  7. Lu, Y., Din, S., Zheng, C., Gao, B.: Using multi-feature and classifier ensembles to improve malware detection. Journal of CCIT 39(2), 57–72 (2010)

    Google Scholar 

  8. Masud, M.M., Khan, L., Thuraisingham, B.: Feature Based Techniques for Auto-Detection of Novel Email Worms. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 205–216. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Mehdi, S.B., Tanwani, A.K., Farooq, M.: IMAD: in-execution malware analysis and detection. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO 2009, pp. 1553–1560. ACM (2009)

    Google Scholar 

  10. Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Elovici, Y.: Unknown malcode detection via text categorization and the imbalance problem. In: 2008 IEEE International Conference on Intelligence and Security Informatics, pp. 156–161 (2008)

    Google Scholar 

  11. Ranker Search Method, http://weka.sourceforge.net/doc.stable/weka/attributeSelection/Ranker.html

  12. Tian, R., Islam, R., Batten, L., Versteeg, S.: Differentiating malware from cleanware using behavioural analysis. In: Proceedings of the 5rd International Conference on Malicious and Unwanted Software: MALWARE 2010 (2010)

    Google Scholar 

  13. Wang, T.-Y., Wu, C.-H., Hsieh, C.-C.: A virus prevention model based on static analysis and data mining methods. In: IEEE 8th International Conference on Computer and Information Technology Workshops, CIT Workshops 2008, pp. 288–293 (2008)

    Google Scholar 

  14. Wang, T., Wu, C., Hsieh, C.: Detecting unknown malicious executables using portable executable headers. In: Fifth International Joint Conference on INC, IMS and IDC, NCM 2009, pp. 278–284. IEEE (2009)

    Google Scholar 

  15. Waikato Environment for Knowledge Acquisition (WEKA): Data Mining Software in Java. University of Waikato, http://www.cs.waikato.ac.nz/ml/weka

  16. Witten, I., Frank, E., Hall, M.A.: Data mining: Practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)

    Google Scholar 

  17. Ye, Y., Li, T., Jiang, Q., Wang, Y.: CIMDS: Adapting Postprocessing Techniques of Associative Classification for Malware Detection. IEEE Transactions on Systems, Man, and Cybernetics 40(3), 298–307 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moonsamy, V., Tian, R., Batten, L. (2012). Feature Reduction to Speed Up Malware Classification. In: Laud, P. (eds) Information Security Technology for Applications. NordSec 2011. Lecture Notes in Computer Science, vol 7161. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29615-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29615-4_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29614-7

  • Online ISBN: 978-3-642-29615-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics