Evolutionary Cost-Sensitive Ensemble for Malware Detection

Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 299)

Abstract

Malware detection is among the most extensively developed areas for computer security. Unauthorized, malicious software can cause expensive damage to both private users and companies. It can destroy the computer, breach the privacy of user and result in loss of valuable data. The amount of data uploaded and downloaded each day makes almost impossible for manual screening of each incoming software package. That is why there is a need for effective intelligent filters, that can automatically dichotomize between the safe and dangerous applications. The number of malware programs, that are faced by the detection system, is typically much smaller than the number of desired programs. Therefore, we have to deal with the imbalanced classification problem, in which standard classification algorithms tend to fail. In this paper, we present a novel ensemble, based on cost-sensitive decision trees. Individual classifiers are constructed according to an established cost matrix and trained on random feature subspaces to ensure, that they are mutually complementary. Instead of using a fixed cost matrix we derive its parameters via ROC analysis. An evolutionary algorithm is being applied for simultaneous classifier selection and assignment of committee member weights for the fusion process. Experimental analysis, carried out on a large malware dataset, prove that our method is capable of outperforming other state-of-the-art algorithms, and hence is an effective approach for the problem of imbalanced malware detection.

Keywords

machine learning classifier ensemble multiple classifier system imbalanced classification cost-sensitive malware detection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abbasi, F.H., Harris, R., Marsland, S., Moretti, G.: An exemplar-based learning approach for detection and classification of malicious network streams in honeynets. Security and Communication Networks 7(2), 352–364 (2014)CrossRefGoogle Scholar
  2. 2.
    Alpaydin, E.: Combined 5 x 2 cv f test for comparing supervised classification learning algorithms. Neural Computation 11(8), 1885–1892 (1999)CrossRefGoogle Scholar
  3. 3.
    Błaszczyński, J., Deckert, M., Stefanowski, J., Wilk, S.: Integrating selective pre-processing of imbalanced data with ivotes ensemble. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 148–157. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Chapman and Hall (1984)Google Scholar
  5. 5.
    Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: Smoteboost: Improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Fawcett, T.: An introduction to roc analysis. Pattern Recognition Letters 27(8), 861–874 (2006)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)CrossRefGoogle Scholar
  8. 8.
    Krawczyk, B., Schaefer, G., Woźniak, M.: Breast thermogram analysis using a cost-sensitive multiple classifier system. In: Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2012), pp. 507–510 (2012)Google Scholar
  9. 9.
    Krawczyk, B., Woźniak, M., Schaefer, G.: Cost-sensitive decision tree ensembles for effective imbalanced classification. Applied Soft Computing, Part C 14, 554–562 (2014)CrossRefGoogle Scholar
  10. 10.
    Ling, C.X., Yang, Q., Wang, J., Zhang, S.: Decision trees with minimal costs. In: Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004, pp. 544–551 (2004)Google Scholar
  11. 11.
    Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39(2), 539–550 (2009)CrossRefGoogle Scholar
  12. 12.
    Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012, Part II. LNCS, vol. 7209, pp. 139–150. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Nuenz, M.: The use of background knowledge in decision tree induction. Machine Learning 6, 231–250 (1991)Google Scholar
  14. 14.
    Ouellette, J., Pfeffer, A., Lakhotia, A.: Countering malware evolution using cloud-based learning, pp. 85–94 (2013)Google Scholar
  15. 15.
    Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. Journal of Computer Security 19(4), 639–668 (2011)Google Scholar
  16. 16.
    Schrittwieser, S., Katzenbeisser, S., Kieseberg, P., Huber, M., Leithner, M., Mulazzani, M., Weippl, E.: Covert computation - hiding code in code through compile-time obfuscation. Computers and Security 42, 13–26 (2014)CrossRefGoogle Scholar
  17. 17.
    Shan, Z., Wang, X.: Growing grapes in your computer to defend against malware. IEEE Transactions on Information Forensics and Security 9(2), 196–207 (2014)CrossRefGoogle Scholar
  18. 18.
    Sheen, S., Anitha, R., Sirisha, P.: Malware detection by pruning of parallel ensembles using harmony search. Pattern Recognition Letters 34(14), 1679–1686 (2013)CrossRefGoogle Scholar
  19. 19.
    Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40(12), 3358–3378 (2007)CrossRefMATHGoogle Scholar
  20. 20.
    Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence 23(4), 687–719 (2009)CrossRefGoogle Scholar
  21. 21.
    Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings, pp. 324–331 (2009)Google Scholar
  22. 22.
    Ye, Y., Wang, D., Li, T., Ye, D.: Imds: Intelligent malware detection system, pp. 1043–1047 (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Department of Systems and Computer NetworksWrocław University of TechnologyWrocławPoland

Personalised recommendations