Skip to main content

Enhancing Ransomware Classification with Multi-stage Feature Selection and Data Imbalance Correction

  • Conference paper
  • First Online:
Cyber Security, Cryptology, and Machine Learning (CSCML 2023)


Ransomware is a critical security concern, and developing applications for ransomware detection is paramount. Machine learning models are helpful in detecting and classifying ransomware. However, the high dimensionality of ransomware datasets divided into various feature groups such as API calls, Directory, and Registry logs has made it difficult for researchers to create effective machine learning models. Class imbalance also leads to poor results when classifying ransomware families. To tackle these challenges, in this paper we propose a three-stage feature selection method that effectively reduces the dimensionality of the data and considers the varying importance of the different feature groups in the classification of ransomware families. We also applied cost-sensitive learning and re-sampling of the training data using SMOTE to address data imbalance. We applied these techniques to the Elderan ransomware dataset. Our results show that the proposed feature selection method significantly improves the detection of ransomware compared to other state-of-art studies using the same dataset. Furthermore, the data balancing techniques (cost-sensitive learning and SMOTE) were effective in the multi-class classification of ransomware.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. Abbasi, M.S., Al-Sahaf, H., Welch, I.: Particle swarm optimization: a wrapper-based feature selection method for ransomware detection and classification. In: Castillo, P.A., Jiménez Laredo, J.L., Fernández de Vega, F. (eds.) EvoApplications 2020. LNCS, vol. 12104, pp. 181–196. Springer, Cham (2020).

    Chapter  Google Scholar 

  2. Almomani, I., et al.: Android ransomware detection based on a hybrid evolutionary approach in the context of highly imbalanced data. IEEE Access 9, 57674–57691 (2021)

    Article  Google Scholar 

  3. Almousa, M., Basavaraju, S., Anwar, M.: Api-based ransomware detection using machine learning-based threat detection models. In: 2021 18th International Conference on Privacy, Security and Trust (PST), pp. 1–7. IEEE (2021)

    Google Scholar 

  4. Aurangzeb, S., Anwar, H., Naeem, M.A., Aleem, M.: BigRC-EML: big-data based ransomware classification using ensemble machine learning. Clust. Comput. 25(5), 3405–3422 (2022)

    Article  Google Scholar 

  5. Avila, R., Khoury, R., Pere, C., Khanmohammadi, K.: Employing feature selection to improve the performance of intrusion detection systems. In: Aïmeur, E., Laurent, M., Yaich, R., Dupont, B., Garcia-Alfaro, J. (eds.) FPS 2021. LNCS, vol. 13291, pp. 93–112. Springer, Cham (2022).

    Chapter  Google Scholar 

  6. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)

    Article  Google Scholar 

  7. Beaman, C., Barkworth, A., Akande, T.D., Hakak, S., Khan, M.K.: Ransomware: recent advances, analysis, challenges and future research directions. Comput. Secur. 111, 102490 (2021)

    Article  Google Scholar 

  8. Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)

    Article  Google Scholar 

  9. Brownlee, J.: Imbalanced classification with Python: Better Metrics, Balance Skewed Classes, Cost-sensitive Learning. Machine Learning Mastery (2020)

    Google Scholar 

  10. Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)

    Article  Google Scholar 

  11. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  12. Chen, Q., Bridges, R.A.: Automated behavioral analysis of malware: a case study of wannacry ransomware. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 454–460. IEEE (2017)

    Google Scholar 

  13. Collier, R.: NHS ransomware attack spreads worldwide (2017)

    Google Scholar 

  14. Cyber Security Policy: Securing cyber resilience in health and care: October 2018 progress update (2018).

  15. Goyal, M., Kumar, R.: Machine learning for malware detection on balanced and imbalanced datasets. In: 2020 International Conference on Decision Aid Sciences and Application (DASA), pp. 867–871. IEEE (2020)

    Google Scholar 

  16. Khan, F., Ncube, C., Ramasamy, L.K., Kadry, S., Nam, Y.: A digital DNA sequencing engine for ransomware detection using machine learning. IEEE Access 8, 119710–119719 (2020)

    Article  Google Scholar 

  17. Kshetri, N., Voas, J.: Do crypto-currencies fuel ransomware? IT Prof. 19(5), 11–15 (2017)

    Article  Google Scholar 

  18. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2017)

    Article  Google Scholar 

  19. Ma, Y., He, H.: Imbalanced Learning: Foundations, Algorithms, and Applications (2013)

    Google Scholar 

  20. McIntosh, T., Kayes, A., Chen, Y.P.P., Ng, A., Watters, P.: Ransomware mitigation in the modern era: a comprehensive review, research challenges, and future directions. ACM Comput. Surv. (CSUR) 54(9), 1–36 (2021)

    Article  Google Scholar 

  21. Meland, P.H., Bayoumy, Y.F.F., Sindre, G.: The ransomware-as-a-service economy within the darknet. Comput. Secur. 92, 101762 (2020)

    Article  Google Scholar 

  22. Moreira, C.C., de Sales Jr, C.D.S., Moreira, D.C.: Understanding ransomware actions through behavioral feature analysis. J. Commun. Inf. Syst. 37(1), 61–76 (2022)

    Google Scholar 

  23. Pang, Y., Peng, L., Chen, Z., Yang, B., Zhang, H.: Imbalanced learning based on adaptive weighting and gaussian function synthesizing with an application on android malware detection. Inf. Sci. 484, 95–112 (2019)

    Article  Google Scholar 

  24. Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)

    Article  Google Scholar 

  25. Sgandurra, D., Muñoz-González, L., Mohsen, R., Lupu, E.C.: Automated dynamic analysis of ransomware: Benefits, limitations and use for detection. arXiv preprint arXiv:1609.03020 (2016)

  26. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J 27(3), 379–423 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  27. Thabtah, F., Hammoud, S., Kamalov, F., Gonsalves, A.: Data imbalance in classification: experimental evaluation. Inf. Sci. 513, 429–441 (2020)

    Article  MathSciNet  Google Scholar 

  28. Thai-Nghe, N., Gantner, Z., Schmidt-Thieme, L.: Cost-sensitive learning methods for imbalanced data. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2010)

    Google Scholar 

  29. Urdan, T.C.: Statistics in Plain English. Routledge, Abingdon (2011)

    Book  MATH  Google Scholar 

  30. Wu, D., Guo, P., Wang, P.: Malware detection based on cascading XGboost and cost sensitive. In: 2020 International Conference on Computer Communication and Network Security (CCNS), pp. 201–205. IEEE (2020)

    Google Scholar 

Download references


This work was funded by Science Foundation Ireland through the SFI Centre for Research Training in Machine Learning (18/CRT/6183).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Faithful Chiagoziem Onwuegbuche .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Onwuegbuche, F.C., Jurcut, A.D., Pasquale, L. (2023). Enhancing Ransomware Classification with Multi-stage Feature Selection and Data Imbalance Correction. In: Dolev, S., Gudes, E., Paillier, P. (eds) Cyber Security, Cryptology, and Machine Learning. CSCML 2023. Lecture Notes in Computer Science, vol 13914. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34670-5

  • Online ISBN: 978-3-031-34671-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics