Skip to main content
Log in

RAMD: registry-based anomaly malware detection using one-class ensemble classifiers

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Malware is continuously evolving and becoming more sophisticated to avoid detection. Traditionally, the Windows operating system has been the most popular target for malware writers because of its dominance in the market of desktop operating systems. However, despite a large volume of new Windows malware samples that are collected daily, there is relatively little research focusing on Windows malware. The Windows Registry, or simply the registry, is very heavily used by programs in Windows, making it a good source for detecting malicious behavior. In this paper, we present RAMD, a novel approach that uses an ensemble classifier consisting of multiple one-class classifiers to detect known and especially unknown malware abusing registry keys and values for malicious intent. RAMD builds a model of registry behavior of benign programs and then uses this model to detect malware by looking for anomalous registry accesses. In detail, it constructs an initial ensemble classifier by training multiple one-class classifiers and then applies a novel swarm intelligence pruning algorithm, called memetic firefly-based ensemble classifier pruning (MFECP), on the ensemble classifier to reduce its size by selecting only a subset of one-class classifiers that are highly accurate and have diversity in their outputs. To combine the outputs of one-class classifiers in the pruned ensemble classifier, RAMD uses a specific aggregation operator, called Fibonacci-based superincreasing ordered weighted averaging (FSOWA). The results of our experiments performed on a dataset of benign and malware samples show that RAMD can achieve about 98.52% detection rate, 2.19% false alarm rate, and 98.43% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Abbas H, Yasin M, Ahmed F, Sajid A, Khan FA, Ashfaq RAR, Haldar NAH (2016) Forensic artifacts modeling for social media client applications to enhance investigatory learning mechanisms. J Intell Fuzzy Syst 31(5):2645–2658. https://doi.org/10.3233/JIFS-169105

    Article  Google Scholar 

  2. Alazab M (2015) Profiling and classifying the behavior of malicious codes. J Syst Softw 100:91–102. https://doi.org/10.1016/j.jss.2014.10.031

    Article  Google Scholar 

  3. Apap F, Honig A, Hershkop S, Eskin E, Stolfo SJ (2002) Detecting malicious software by monitoring anomalous Windows Registry accesses. In: Proceedings of the 5th International Symposium on Recent Advances in Intrusion Detection (RAID’02), pp 36-53. https://doi.org/10.1007/3-540-36084-0_3. Springer, Berlin

  4. AV-TEST (2017) Security report 2016/17 https://www.av-test.org/fileadmin/pdf/security_report/AV-TEST_security_report_2016-2017.pdf

  5. Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6(1):5–20. https://doi.org/10.1016/j.inffus.2004.04.004

    Article  Google Scholar 

  6. Carvey H (2016) Windows Registry Forensics: Advanced Digital Forensic Analysis of the Windows Registry, 2nd edn. Syngress, Amsterdam

    Google Scholar 

  7. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58. https://doi.org/10.1145/1541880.1541882

    Article  Google Scholar 

  8. Christodorescu M, Jha S (2003) Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th USENIX Security Symposium (Security’03), pp 169-186, USENIX Association, Berkeley, CA, USA

  9. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  10. Ding Y, Xia X, Chen S, Li Y (2018) A malware detection method based on family behavior graph. Comput Secur 73:73–86. https://doi.org/10.1016/j.cose.2017.10.007

    Article  Google Scholar 

  11. Ding Y, Yuan X, Tang K, Xiao X, Zhang Y (2013) A fast malware detection algorithm based on objective-oriented association mining. Comput Secur 39:315–324. https://doi.org/10.1016/j.cose.2013.08.008

    Article  Google Scholar 

  12. Duin RPW, Tax DMJ (2000) Experiments with classifier combining rules. In: Proceedings of the 1st International Workshop on Multiple Classifier Systems (MCS’00). https://doi.org/10.1007/3-540-45014-9_2. Springer, Berlin, pp 16–29

  13. Eskin E (2002) Probabilistic anomaly detection over discrete records using inconsistency checks. Technical report, Department of Computer Science Columbia University

  14. Fattori A, Lanzi A, Balzarotti D, Kirda E (2015) Hypervisor-based malware protection with AccessMiner. Comput Secur 52:33–50. https://doi.org/10.1016/j.cose.2015.03.007

    Article  Google Scholar 

  15. Galal HS, Mahdy YB, Atiea MA (2016) Behavior-based features model for malware detection. J Comput Virol Hacking Techniques 12(2):59–67. https://doi.org/10.1007/s11416-015-0244-0

    Article  Google Scholar 

  16. Gautam C, Tiwari A, Leng Q (2017) On the construction of extreme learning machine for online and offline one-class classification–an expanded toolbox. Neurocomputing 261:126–143. https://doi.org/10.1016/j.neucom.2016.04.070

    Article  Google Scholar 

  17. Ghaffari F, Abadi M, Tajoddin A (2017) AMD-EC: anomaly-based android malware detection using ensemble classifiers. In: Proceedings of the 2017 25th Iranian Conference on Electrical Engineering (ICEE’17), pp 2247-2252. https://doi.org/10.1109/IranianCEE.2017.7985436. IEEE, Piscataway

  18. Guo X, Yin Y, Dong C, Yang G, Zhou G (2008) On the class imbalance problem. In: Proceedings of the 2008 4th International Conference on Natural Computation (ICNC’08), pp 192-201. https://doi.org/10.1109/ICNC.2008.871. IEEE, Piscataway

  19. Gupta S, Kumar P (2015) An immediate system call sequence based approach for detecting malicious program executions in cloud environment. Wirel Pers Commun 81(1):405–425. https://doi.org/10.1007/s11277-014-2136-x

    Article  Google Scholar 

  20. Halsey M, Bettany A (2015) Windows Registry troubleshooting. Apress, New York. https://doi.org/10.1007/978-1-4842-0992-9

    Book  Google Scholar 

  21. Heller KA, Svore KM, Keromytis AD, Stolfo SJ (2003) One class support vector machines for detecting anomalous Windows Registry accesses. In: Proceedings of the 2003 ICDM Workshop on Data Mining for Computer Security (DMSEC’03), pp 1–8. https://doi.org/10.7916/D85M6CFF

  22. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844. https://doi.org/10.1109/34.709601

    Article  Google Scholar 

  23. Hollander M, Wolfe DA, Chicken E (2014) Nonparametric statistical methods, 3rd edn. Wiley, Hoboken

    MATH  Google Scholar 

  24. Hosseini Bamakan SM, Wang H, Shi Y (2017) Ramp loss K-support vector classification-regression: a robust and sparse multi-class approach to the intrusion detection problem. Knowl-Based Syst 126:113–126. https://doi.org/10.1016/j.knosys.2017.03.012

    Article  Google Scholar 

  25. Jodavi M, Abadi M (2015) JSObfusDetector: a binary PSO-based one-class classifier ensemble to detect obfuscated JavaScript code. In: Proceedings of the 2015 International Symposium on Artificial Intelligence and Signal Processing (AISP’15), pp 322-327. https://doi.org/10.1109/AISP.2015.7123508. IEEE, Piscataway

  26. Jodavi M, Abadi M, Parhizkar E (2015) DbDHunter: an ensemble-based anomaly detection approach to detect drive-by download attacks. In: Proceedings of the 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE’15), pp 273-278. https://doi.org/10.1109/ICCKE.2015.7365841. IEEE, Piscataway

  27. Juszczak P, Tax DMJ, Pekalska E, Duin RPW (2009) Minimum spanning tree based one-class classifier. Neurocomputing 72(7–9):1859–1869. https://doi.org/10.1016/j.neucom.2008.05.003

    Article  Google Scholar 

  28. Karaboga D, Gorkemli B, Ozturk C, Karaboga N (2014) A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif Intell Rev 42(1):21–57. https://doi.org/10.1007/s10462-012-9328-0

    Article  Google Scholar 

  29. Kazem A, Sharifi E, Hussain FK, Saberi M, Hussain OK (2013) Support vector regression with chaos-based firefly algorithm for stock market price forecasting. Appl Soft Comput 13(2):947–958. https://doi.org/10.1016/j.asoc.2012.09.024

    Article  Google Scholar 

  30. Khan SS, Madden MG (2014) One-class classification: taxonomy of study and review of techniques. Knowl Eng Rev 29(3):345–374. https://doi.org/10.1017/S026988891300043X

    Article  Google Scholar 

  31. Khatri Y (2015) Forensic implications of System Resource Usage Monitor (SRUM) data in Windows 8. Digit Investig 12:53–65. https://doi.org/10.1016/j.diin.2015.01.002

    Article  Google Scholar 

  32. Khreich W, Murtaza SS, Hamou-Lhadj A, Talhi C (2018) Combining heterogeneous anomaly detectors for improved software security. J Syst Softw 137:415–429. https://doi.org/10.1016/j.jss.2017.02.050

    Article  Google Scholar 

  33. Kirat D, Vigna G (2015) MalGene: Automatic extraction of malware analysis evasion signature. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS’15), pp 769-780. https://doi.org/10.1145/2810103.2813642. ACM, New York

  34. Kirat D, Vigna G, Kruegel C (2014) BareCloud: bare-metal analysis-based evasive malware detection. In: Proceedings of the 23rd USENIX Security Symposium (Security’14), pp 287-301, USENIX Association, Berkeley, CA, USA

  35. Kolbitsch C, Comparetti PM, Kruegel C, Kirda E, Zhou X, Wang X (2009) Effective and efficient malware detection at the end host. In: Proceedings of the 18th USENIX Security Symposium (Security’09), pp 351-366, USENIX Association, Berkeley, CA, USA

  36. Kramer O (2017) Genetic algorithm essentials. Springer international publishing. Cham, Switzerland. https://doi.org/10.1007/978-3-319-52156-5

    Book  Google Scholar 

  37. Krawczyk B, Woźniak M (2016) Dynamic classifier selection for one-class classification. Knowl-Based Syst 107:43–53. https://doi.org/10.1016/j.knosys.2016.05.054

    Article  Google Scholar 

  38. Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207. https://doi.org/10.1023/A:1022859003006

    Article  MATH  Google Scholar 

  39. Lei B, Xu G, Feng M, Zou Y, van der Heijden F, de Ridder D, Tax DMJ (2017) Classification, parameter estimation and state estimation: an engineering approach using MATLAB, 2nd edn. Wiley, Hoboken

    Google Scholar 

  40. Liu J, Miao Q, Sun Y, Song J, Quan Y (2016) Fast structural ensemble for one-class classification. Pattern Recogn Lett 80:179–187. https://doi.org/10.1016/j.patrec.2016.06.028

    Article  Google Scholar 

  41. Long NC, Meesad P, Unger H (2015) A highly accurate firefly based algorithm for heart disease prediction. Expert Syst Appl 42(21):8221–8231. https://doi.org/10.1016/j.eswa.2015.06.024

    Article  Google Scholar 

  42. Luo L, Ming J, Wu D, Liu P, Zhu S (2017) Semantics-based obfuscation-resilient binary code similarity comparison with applications to software and algorithm plagiarism detection. IEEE Trans Softw Eng 43(12):1157–1177. https://doi.org/10.1109/TSE.2017.2655046

    Article  Google Scholar 

  43. Mandayam Comar P, Liu L, Saha S, Tan PN, Nucci A (2013) Combining supervised and unsupervised learning for zero-day malware detection. In: Proceedings of the 32nd IEEE International Conference on Computer Communications (INFOCOM’13), pp 2022-2030. https://doi.org/10.1109/INFCOM.2013.6567003. IEEE, Piscataway

  44. Miao Q, Liu J, Cao Y, Song J (2016) Malware detection using bilayer behavior abstraction and improved one-class support vector machines. Int J Inf Secur 15(4):361–379. https://doi.org/10.1007/s10207-015-0297-6

    Article  Google Scholar 

  45. Miller RG Jr (1997) Beyond ANOVA: basics of applied statistics. Chapman and Hall/CRC, London

    Book  MATH  Google Scholar 

  46. Naval S, Laxmi V, Rajarajan M, Gaur MS, Conti M (2015) Employing program semantics for malware detection. IEEE Trans Inf Forensics Secur 10(12):2591–2604. https://doi.org/10.1109/TIFS.2015.2469253

    Article  Google Scholar 

  47. Neri F, Cotta C (2012) Memetic algorithms and memetic computing optimization: a literature review. Swarm Evol Comput 2:1–14. https://doi.org/10.1016/j.swevo.2011.11.003

    Article  Google Scholar 

  48. Nissim N, Lapidot Y, Cohen A, Elovici Y (2018) Trusted system-calls analysis methodology aimed at detection of compromised virtual machines using sequential mining. Knowl-Based Syst 153:147–175. https://doi.org/10.1016/j.knosys.2018.04.033

    Article  Google Scholar 

  49. O’Kane P, Sezer S, Mclaughlin K (2011) Obfuscation: the hidden malware. IEEE Secur Priv 9(5):41–47. https://doi.org/10.1109/MSP.2011.98

    Article  Google Scholar 

  50. Parhizkar E, Abadi M (2015) BeeOWA: a novel approach based on ABC algorithm and induced OWA operators for constructing one-class classifier ensembles. Neurocomputing 166:367–381. https://doi.org/10.1016/j.neucom.2015.03.051

    Article  Google Scholar 

  51. Reformat M, Yager RR (2008) Building ensemble classifiers using belief functions and OWA operators. Soft Comput 12(6):543–558. https://doi.org/10.1007/s00500-007-0227-2

    Article  MATH  Google Scholar 

  52. Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1):1–39. https://doi.org/10.1007/s10462-009-9124-7

    Article  MathSciNet  Google Scholar 

  53. Rudd EM, Rozsa A, Günther M, Boult TE (2017) A survey of stealth malware: attacks, mitigation measures, and steps toward autonomous open world solutions. IEEE Commun Surv Tutorials 19(2):1145–1172. https://doi.org/10.1109/COMST.2016.2636078

    Article  Google Scholar 

  54. Sengupta S, Das AK (2016) An approach to development of an ensemble classification system. In: Proceedings of the 2016 2nd International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN’16), pp 218-223. https://doi.org/10.1109/ICRCICN.2016.7813659. IEEE, Piscataway

  55. Shen YD, Zhang Z, Yang Q (2002) Objective-oriented utility-based association mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), pp 426-433. https://doi.org/10.1109/ICDM.2002.1183938. IEEE, Piscataway

  56. Stolfo SJ, Apap F, Eskin E, Heller KA, Hershkop S, Honig A, Svore KM (2005) A comparative evaluation of two algorithms for Windows Registry anomaly detection. J Comput Secur 13(4):659–693. https://doi.org/10.3233/JCS-2005-13403

    Article  Google Scholar 

  57. Su H, Cai Y, Du Q (2017) Firefly-algorithm-inspired framework with band selection and extreme learning machine for hyperspectral image classification. IEEE J Sel Topics Appl Earth Observations Remote Sens 10(1):309–320. https://doi.org/10.1109/JSTARS.2016.2591004

    Article  Google Scholar 

  58. Symantec (2016) Internet security threat report (ISTR) https://www.symantec.com/content/dam/symantec/docs/reports/istr-21-2016-en.pdf

  59. Tax DMJ (2018) DDTools, the data description toolbox for MATLAB. Version 2.1.3

  60. Wasikowski M, Chen XW (2010) Combating the small sample class imbalance problem using feature selection. IEEE Trans Knowl Data Eng 22(10):1388–1400. https://doi.org/10.1109/TKDE.2009.187

    Article  Google Scholar 

  61. Xing HJ, Ji M (2018) Robust one-class support vector machine with rescaled hinge loss function. Pattern Recogn 84:152–164. https://doi.org/10.1016/j.patcog.2018.07.015

    Article  Google Scholar 

  62. Xing HJ, Wang XZ (2017) Selective ensemble of SVDDs with Renyi entropy based diversity measure. Pattern Recogn 61:185–196. https://doi.org/10.1016/j.patcog.2016.07.038

    Article  Google Scholar 

  63. Yager RR (1988) On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Trans Syst Man Cybern 18(1):183–190. https://doi.org/10.1109/21.87068

    Article  MathSciNet  MATH  Google Scholar 

  64. Yager RR (1993) Families of OWA operators. Fuzzy Sets Syst 59(2):125–148. https://doi.org/10.1016/0165-0114(93)90194-M

    Article  MathSciNet  MATH  Google Scholar 

  65. Yager RR, Grichnik AJ, Yager RL (2014) A soft computing approach to controlling emissions under imperfect sensors. IEEE Trans Syst Man Cybern 44(6):687–691. https://doi.org/10.1109/TSMC.2013.2268735

    Article  Google Scholar 

  66. Yahyazadeh M, Abadi M (2015) BotGrab: a negative reputation system for botnet detection. Comput Electr Eng 41:68–85. https://doi.org/10.1016/j.compeleceng.2014.10.010

    Article  Google Scholar 

  67. Yang XS (2010) Firefly algorithm, stochastic test functions and design optimisation. Int J Bio-Inspired Comput 2(2):78–84. https://doi.org/10.1504/IJBIC.2010.032124

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahdi Abadi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tajoddin, A., Abadi, M. RAMD: registry-based anomaly malware detection using one-class ensemble classifiers. Appl Intell 49, 2641–2658 (2019). https://doi.org/10.1007/s10489-018-01405-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-01405-0

Keywords

Navigation