Evolutionary Cost-Sensitive Ensemble for Malware Detection
Malware detection is among the most extensively developed areas for computer security. Unauthorized, malicious software can cause expensive damage to both private users and companies. It can destroy the computer, breach the privacy of user and result in loss of valuable data. The amount of data uploaded and downloaded each day makes almost impossible for manual screening of each incoming software package. That is why there is a need for effective intelligent filters, that can automatically dichotomize between the safe and dangerous applications. The number of malware programs, that are faced by the detection system, is typically much smaller than the number of desired programs. Therefore, we have to deal with the imbalanced classification problem, in which standard classification algorithms tend to fail. In this paper, we present a novel ensemble, based on cost-sensitive decision trees. Individual classifiers are constructed according to an established cost matrix and trained on random feature subspaces to ensure, that they are mutually complementary. Instead of using a fixed cost matrix we derive its parameters via ROC analysis. An evolutionary algorithm is being applied for simultaneous classifier selection and assignment of committee member weights for the fusion process. Experimental analysis, carried out on a large malware dataset, prove that our method is capable of outperforming other state-of-the-art algorithms, and hence is an effective approach for the problem of imbalanced malware detection.
Keywordsmachine learning classifier ensemble multiple classifier system imbalanced classification cost-sensitive malware detection
Unable to display preview. Download preview PDF.
- 3.Błaszczyński, J., Deckert, M., Stefanowski, J., Wilk, S.: Integrating selective pre-processing of imbalanced data with ivotes ensemble. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 148–157. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 4.Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Chapman and Hall (1984)Google Scholar
- 8.Krawczyk, B., Schaefer, G., Woźniak, M.: Breast thermogram analysis using a cost-sensitive multiple classifier system. In: Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2012), pp. 507–510 (2012)Google Scholar
- 10.Ling, C.X., Yang, Q., Wang, J., Zhang, S.: Decision trees with minimal costs. In: Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004, pp. 544–551 (2004)Google Scholar
- 13.Nuenz, M.: The use of background knowledge in decision tree induction. Machine Learning 6, 231–250 (1991)Google Scholar
- 14.Ouellette, J., Pfeffer, A., Lakhotia, A.: Countering malware evolution using cloud-based learning, pp. 85–94 (2013)Google Scholar
- 15.Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. Journal of Computer Security 19(4), 639–668 (2011)Google Scholar
- 21.Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings, pp. 324–331 (2009)Google Scholar
- 22.Ye, Y., Wang, D., Li, T., Ye, D.: Imds: Intelligent malware detection system, pp. 1043–1047 (2007)Google Scholar