Skip to main content
Log in

Unmasking the common traits: an ensemble approach for effective malware detection

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

Malware detection has become a critical aspect of ensuring the security and integrity of computer systems. With the ever-evolving landscape of malicious software, developing effective detection methods is of utmost importance. This study focuses on the identification of important features for malware detection methods, aiming to enhance the accuracy and efficiency of such systems. In this work, we propose an ensemble approach called FRAMC to identify the key features that contribute significantly to the detection of malware. The effectiveness of FRAMC is assessed using different types of classifiers on a number of real-world malware datasets. The outcomes of our analysis demonstrate that the proposed approach excels in terms of performance when compared to other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The data associated with this article is available upon request via email.

References

  1. Hughes, G.: On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 14(1), 55–63 (1968)

    Article  Google Scholar 

  2. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Article  Google Scholar 

  3. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(Mar), 1157–1182 (2003)

    Google Scholar 

  4. Arauzo-Azofra, A., Aznarte, J.L., Benítez, J.M.: Empirical study of feature selection methods based on individual feature evaluation for classification problems. Expert Syst. Appl. 38(7), 8170–8177 (2011)

    Article  Google Scholar 

  5. Cadenas, J.M., Garrido, M.C., MartíNez, R.: Feature subset selection filter-wrapper based on low quality data. Expert Syst. Appl. 40(16), 6241–6252 (2013)

    Article  Google Scholar 

  6. Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Machine Learning Proceedings 1992, pp. 249–256. Elsevier, Amsterdam (1992)

  7. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)

    Article  Google Scholar 

  8. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2017)

    Article  Google Scholar 

  9. Nie, F., Huang, H., Cai, X., Ding, C.: Efficient and robust feature selection via joint 2, 1-norms minimization. In: Advances in Neural Information Processing Systems, vol. 23 (2010)

  10. Iftikhar, S., Al-Madani, D., Abdullah, S., Saeed, A., Fatima, K.: A supervised feature selection method for malicious intrusions detection in IoT based on genetic algorithm. Int. J. Comput. Sci. Netw. Secur. 23(3), 49–56 (2023)

    Google Scholar 

  11. Bhadra, T., Bandyopadhyay, S.: Supervised feature selection using integration of densest subgraph finding with floating forward-backward search. Inf. Sci. 566, 1–18 (2021)

    Article  MathSciNet  Google Scholar 

  12. Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342 (2010)

  13. Liu, R., Rallo, R., Cohen, Y.: Unsupervised feature selection using incremental least squares. Int. J. Inf. Technol. Decis. Mak. 10(06), 967–987 (2011)

    Article  Google Scholar 

  14. Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5(Aug), 845–889 (2004)

    MathSciNet  Google Scholar 

  15. Shi, D., Zhu, L., Li, J., Zhang, Z., Chang, X.: Unsupervised adaptive feature selection with binary hashing. IEEE Trans. Image Process. 32, 838–853 (2023)

    Article  Google Scholar 

  16. Jahani, M.S., Aghamollaei, G., Eftekhari, M., Saberi-Movahed, F.: Unsupervised feature selection guided by orthogonal representation of feature space. Neurocomputing 516, 61–76 (2023)

    Article  Google Scholar 

  17. Xu, Z., King, I., Lyu, M.R.-T., Jin, R.: Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans. Neural Netw. 21(7), 1033–1047 (2010)

    Article  Google Scholar 

  18. Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 641–646. SIAM (2007)

  19. Sheikhpour, R., Sarram, M.A., Gharaghani, S., Chahooki, M.A.Z.: A survey on semi-supervised feature selection methods. Pattern Recogn. 64, 141–158 (2017)

    Article  Google Scholar 

  20. Karimi, F., Dowlatshahi, M.B., Hashemi, A.: Semiaco: a semi-supervised feature selection based on ant colony optimization. Expert Syst. Appl. 214, 119130 (2023)

    Article  Google Scholar 

  21. Shu, W., Yu, J., Yan, Z., Qian, W.: Semi-supervised feature selection for partially labeled mixed-type data based on multi-criteria measure approach. Int. J. Approx. Reason. 153, 258–279 (2023)

    Article  MathSciNet  Google Scholar 

  22. Ambusaidi, M.A., He, X., Nanda, P., Tan, Z.: Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans. Comput. 65(10), 2986–2998 (2016)

    Article  MathSciNet  Google Scholar 

  23. Maldonado, S., Weber, R.: A wrapper method for feature selection using support vector machines. Inf. Sci. 179(13), 2208–2217 (2009)

    Article  Google Scholar 

  24. Lu, M.: Embedded feature selection accounting for unknown data heterogeneity. Expert Syst. Appl. 119, 350–361 (2019)

    Article  Google Scholar 

  25. Ghosh, M., Guha, R., Sarkar, R., Abraham, A.: A wrapper-filter feature selection technique based on ant colony optimization. Neural Comput. Appl. 32, 7839–7857 (2020)

    Article  Google Scholar 

  26. Wei, F., Li, Y., Roy, S., Ou, X., Zhou, W.: Deep ground truth analysis of current android malware. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA’17), pp. 252–276. Springer, Bonn (2017)

  27. Borah, P., Bhattacharyya, D., Kalita, J.: Malware dataset generation and evaluation. In: 2020 IEEE 4th Conference on Information & Communication Technology (CICT), pp. 1–6. IEEE (2020)

  28. Hall, M.A.: Correlation-based Feature Selection for Machine Learning, vol. 27, no. 2, pp. 115–131. University of Waikato, Hamilton (1999)

  29. Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)

    Article  Google Scholar 

  30. Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1–2), 23–69 (2003)

    Article  Google Scholar 

  31. Yu, L., Liu, H.: Efficient attribute selection in the presence of categorical target attributes. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 594–601. AUAI Press (2004)

  32. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Article  Google Scholar 

  33. Hemelo, L.: Malware Exploratory Dataset. https://www.kaggle.com/code/luizhemelo/malware-exploratory/input

  34. Lashkari, A.H., Kadir, A.F.A., Taheri, L., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark android malware datasets and classification. In: 2018 International Carnahan Conference on Security Technology (ICCST), pp. 1–7. IEEE (2018)

  35. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

  36. Vidal-Naquet, M., Ullman, S.: Object recognition with informative features and linear classification. In: ICCV, vol. 3, p. 281 (2003)

  37. Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence, pp. 388–391. IEEE (1995)

Download references

Acknowledgements

This work was supported by the Ministry of Electronics and Information Technology, Govt. of India.

Author information

Authors and Affiliations

Authors

Contributions

Parthajit Borah: Methodology, Experimentation, Validation and Writing. Upasana Sarmah: Data Collection, Experimentation, and Data Analysis. D.K. Bhattacharyya: Writing–Review and Supervision. J.K. Kalita: Review and Editing and supervision.

Corresponding author

Correspondence to Parthajit Borah.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Borah, P., Sarmah, U., Bhattacharyya, D.K. et al. Unmasking the common traits: an ensemble approach for effective malware detection. Int. J. Inf. Secur. (2024). https://doi.org/10.1007/s10207-024-00854-8

Download citation

  • Published:

  • DOI: https://doi.org/10.1007/s10207-024-00854-8

Keywords

Navigation