Skip to main content

HybriDroid: an empirical analysis on effective malware detection model developed using ensemble methods

Abstract

Malware detection from the smartphone has become a challenging issue for academicians and researchers. In this research paper, we applied five distinct machine learning algorithms and three different ensemble methods to develop a model for detecting malware from an Android-based smartphone. In this study, we proposed a framework that helps in selecting the right sets of the feature with an aim to improve the performance of the malware detection models. The proposed malware detection framework is then validated by considering two distinct performance parameters, i.e., accuracy and F-measure as a benchmark to detect malware from real-world apps. We performed an empirical study on thirty different categories of Android apps. The experimental data set consists of 1,94,659 benign apps and 67,538 malware apps that are collected from different promised repositories. Empirical results reveal that the models developed by using the proposed feature selection framework are able to detect more malware-infected apps when compared to all extracted feature sets. Moreover, the malware detection model build by using nonlinear ensemble decision tree forest (NDTF) approach is achieved a detection rate of 98.8%. In addition to that, the proposed malware detection framework is more effective in detecting malware-infected apps as compared to different anti-virus scanners and different frameworks or approaches developed in the literature.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. 1.

    https://play.google.com/store?hl=en.

  2. 2.

    https://buildfire.com/app-statistics/.

  3. 3.

    https://buildfire.com/app-statistics/.

  4. 4.

    https://nypost.com/2017/11/08/americans-check-their-phones-80-times-a-day-study/.

  5. 5.

    https://www.smartinsights.com/mobile-marketing/mobile-marketing-analytics/mobile-marketing-statistics/.

  6. 6.

    https://www.statista.com/statistics/271644/worldwide-free-and-paid-mobile-app-store-downloads/.

  7. 7.

    https://developer.android.com/training/permissions/requesting.html.

  8. 8.

    https://www.mcafee.com/enterprise/en-us/assets/reports/rp-mobile-threat-report-2019.pdf.

  9. 9.

    https://doi.org/10.17632/dc7wytfhsm.1.

  10. 10.

    https://code.google.com/archive/p/androguard/.

  11. 11.

    http://anubis.iseclab.org/.

  12. 12.

    http://copperdroid.isg.rhul.ac.uk/copperdroid/index.php.

  13. 13.

    https://play.google.com/store?hl=en.

  14. 14.

    https://en.softonic.com/android.

  15. 15.

    https://www.androidauthority.com/apps/.

  16. 16.

    https://download.cnet.com/android/.

  17. 17.

    http://sanddroid.xjtu.edu.cn:8080/.

  18. 18.

    https://www.virustotal.com/.

  19. 19.

    https://www.microsoft.com/en-in/windows/comprehensive-security.

  20. 20.

    https://github.com/ArvindMahindru66/Computer-and-security-dataset/blob/master/Computer%20and%20security%20file%201.xlsx.

References

  1. 1.

    Allix K, Bissyandé TF, Jérome Q, Klein J, Traon YL et al (2016) Empirical assessment of machine learning-based malware detectors for android. Empir Softw Eng 21(1):183–211

    Article  Google Scholar 

  2. 2.

    Alzaylaee MK, Yerima SY, Sezer S (2020) Dl-droid: deep learning based android malware detection using real devices. Comput Secur 89:101663

    Article  Google Scholar 

  3. 3.

    Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, Siemens C (2014) Drebin: effective and explainable detection of android malware in your pocket. In: Ndss, vol 14, pp 23–26

  4. 4.

    Azmoodeh A, Dehghantanha A, Choo KKR (2018) Robust malware detection for internet of (battlefield) things devices using deep eigenspace learning. IEEE Trans Sustain Comput 4(1):88–95

    Article  Google Scholar 

  5. 5.

    Badhani S, Muttoo SK (2019) Android malware detection using code graphs. In: Kapur P et al (eds) System Performance and management analytics. Springer, Singapore, pp 203–215

    Chapter  Google Scholar 

  6. 6.

    Battiti R (1992) First- and second-order methods for learning: between steepest descent and newton’s method. Neural Comput 4(2):141–166

    Article  Google Scholar 

  7. 7.

    Birendra C (2016) Android permission model. arXiv:160704256

  8. 8.

    Burguera I, Zurutuza U, Nadjm-Tehrani S (2011) Crowdroid: behavior-based malware detection system for android. In: Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, pp 15–26

  9. 9.

    Chen KZ, Johnson NM, D’Silva V, Dai S, MacNamara K, Magrino TR, Wu EX, Rinard M, Song DX (2013) Contextual policy enforcement in android applications with permission event graphs. In: NDSS, p 234

  10. 10.

    Chidamber SR, Kemerer CF (1991) Towards a metrics suite for object oriented design. In: Conference Proceedings on Object-Oriented Programming Systems, Languages, and Applications, pp 197–211

  11. 11.

    Desnos A et al. (2013) Androguard-reverse engineering, malware and goodware analysis of android applications. URL code google com/p/androguard 153

  12. 12.

    Dini G, Martinelli F, Saracino A, Sgandurra D (2012) Madam: a multi-level anomaly detector for android malware. In: International Conference on Mathematical Methods, Models, and Architectures for Computer Network Security. Springer, pp 240–253

  13. 13.

    Enck W, Gilbert P, Han S, Tendulkar V, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2014) Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans Comput Syst 32(2):1–29

    Article  Google Scholar 

  14. 14.

    Faruki P, Ganmoor V, Laxmi V, Gaur MS, Bharmal A (2013) Androsimilar: robust statistical feature signature for android malware detection. In: Proceedings of the 6th International Conference on Security of Information and Networks, pp 152–159

  15. 15.

    Fereidooni H, Conti M, Yao D, Sperduti A (2016) Anastasia: android malware detection using static analysis of applications. In: 2016 8th IFIP International Conference on New Technologies, Mobility and Security (NTMS). IEEE, pp 1–5

  16. 16.

    Gonzalez H, Stakhanova N, Ghorbani AA (2014) Droidkin: lightweight detection of android apps similarity. In: International Conference on Security and Privacy in Communication Networks. Springer, pp 436–453

  17. 17.

    Horowitz JL, Savin N (2001) Binary response models: logits, probits and semiparametrics. J Econ Perspect 15(4):43–56

    Article  Google Scholar 

  18. 18.

    Hou S, Ye Y, Song Y, Abdulhayoglu M (2017) Hindroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1507–1515

  19. 19.

    Idrees F, Rajarajan M, Conti M, Chen TM, Rahulamathavan Y (2017) Pindroid: a novel android malware detection system using ensemble learning methods. Comput Secur 68:36–46

    Article  Google Scholar 

  20. 20.

    Kadir AFA, Stakhanova N, Ghorbani AA (2015) Android botnets: what urls are telling us. In: International Conference on Network and System Security. Springer, pp 78–91

  21. 21.

    Karbab EB, Debbabi M, Derhab A, Mouheb D (2018) Maldozer: automatic framework for android malware detection using deep learning. Digit Investig 24:S48–S59

    Article  Google Scholar 

  22. 22.

    Kaur J, Singh S, Kahlon KS, Bassi P (2010) Neural network: a novel technique for software effort estimation. Int J Comput Theory Eng 2(1):17

    Article  Google Scholar 

  23. 23.

    Kothari CR (2004) Research methodology: methods and techniques. New Age International, New Delhi

    Google Scholar 

  24. 24.

    Kumar L, Hota C, Mahindru A, Neti LBM (2019) Android malware prediction using extreme learning machine with different kernel functions. In: Proceedings of the Asian Internet Engineering Conference, pp 33–40

  25. 25.

    Lashkari AH, Kadir AFA, Taheri L, Ghorbani AA (2018) Toward developing a systematic approach to generate benchmark android malware datasets and classification. In: 2018 International Carnahan Conference on Security Technology (ICCST). IEEE, pp 1–7

  26. 26.

    Lee WY, Saxe J, Harang R (2019) Seqdroid: obfuscated android malware detection using stacked convolutional and recurrent neural networks. In: Alazab M, Tang M (eds) Deep learning applications for cyber security. Springer, Cham, pp 197–210

    Chapter  Google Scholar 

  27. 27.

    Lindorfer M, Neugschwandtner M, Weichselbaum L, Fratantonio Y, Veen VVD, Platzer C (2014) Andrubis–1,000,000 apps later: a view on current android malware behaviors. In: 2014 Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS). IEEE, pp 3–17

  28. 28.

    Mahindru A, Sangal A (2019) Deepdroid: feature selection approach to detect android malware using deep learning. In: 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS). IEEE, pp 16–19

  29. 29.

    Mahindru A, Sangal A (2020a) Dldroid: feature selection based malware detection framework for android apps developed during covid-19. Int J Emerg Technol 11(3):516–525

    Google Scholar 

  30. 30.

    Mahindru A, Sangal A (2020b) Feature-based semi-supervised learning to detect malware from android. In: Satapathy et al. (eds) Automated software engineering: a deep learning-based approach. Springer, pp 93–118

  31. 31.

    Mahindru A, Sangal A (2020c) Gadroid: a framework for malware detection from android by using genetic algorithm as feature selection approach. Int J Adv Sci Technol 29(5):5532–5543

    Google Scholar 

  32. 32.

    Mahindru A, Sangal A (2020d) Mldroid–framework for android malware detection using machine learning techniques. Neural Comput Appl 1–58. https://doi.org/10.1007/s00521-020-05309-4

  33. 33.

    Mahindru A, Sangal A (2020e) Parudroid: validation of android malware detection dataset. J Cybersecur Inf Manag 3(2):42–52

    Google Scholar 

  34. 34.

    Mahindru A, Sangal A (2020f) Perbdroid: effective malware detection model developed using machine learning classification techniques. In: Singh J et al (eds) A journey towards bio-inspired techniques in software engineering. Springer, Berlin, pp 103–139

    Chapter  Google Scholar 

  35. 35.

    Mahindru A, Sangal A (2020g) Semidroid: a behavioral malware detector based on unsupervised machine learning techniques using feature selection approaches. Int J Mach Learn Cybern 1–43. https://doi.org/10.1007/s13042-020-01238-9

  36. 36.

    Mahindru A, Sangal A (2020h) Somdroid: android malware detection by artificial neural network trained using unsupervised learning. Evol Intell 1–31. https://doi.org/10.1007/s12065-020-00518-1

  37. 37.

    Mahindru A, Singh P (2017) Dynamic permissions based android malware detection using machine learning techniques. In: Proceedings of the 10th Innovations in Software Engineering Conference, pp 202–210

  38. 38.

    Mariconti E, Onwuzurike L, Andriotis P, De Cristofaro E, Ross G, Stringhini G (2016) Mamadroid: detecting android malware by building Markov chains of behavioral models. arXiv:161204433

  39. 39.

    Martín A, Menéndez HD, Camacho D (2017) Mocdroid: multi-objective evolutionary classifier for android malware detection. Soft Comput 21(24):7405–7415

    Article  Google Scholar 

  40. 40.

    Mas’ud MZ, Sahib S, Abdollah MF, Selamat SR, Yusof R (2014) Analysis of features selection and machine learning classifier in android malware detection. In: 2014 International Conference on Information Science & Applications (ICISA). IEEE, pp 1–5

  41. 41.

    McLaughlin N, del Rincon JM, Kang B, Yerima S, Miller P, Sezer S, Safaei Y, Trickel E, Zhao Z, Doupé A et al (2017) Deep android malware detection. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp 301–308

  42. 42.

    Narayanan A, Chandramohan M, Chen L, Liu Y (2018) A multi-view context-aware approach to android malware detection and malicious code localization. Empir Softw Eng 23(3):1222–1274

    Article  Google Scholar 

  43. 43.

    Narudin FA, Feizollah A, Anuar NB, Gani A (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Comput 20(1):343–357

    Article  Google Scholar 

  44. 44.

    Saracino A, Sgandurra D, Dini G, Martinelli F (2016) Madam: effective and efficient behavior-based android malware detection and prevention. IEEE Trans Dependable Secure Comput 15(1):83–97

    Article  Google Scholar 

  45. 45.

    Shabtai A, Kanonov U, Elovici Y, Glezer C, Weiss Y (2012) “Andromaly”: a behavioral malware detection framework for android devices. J Intell Inf Syst 38(1):161–190

    Article  Google Scholar 

  46. 46.

    Shahzad F, Akbar M, Khan S, Farooq M (2013) Tstructdroid: realtime malware detection using in-execution dynamic analysis of kernel process control blocks on android. National University of Computer & Emerging Sciences, Islamabad, Pakistan, Technical report

  47. 47.

    Shankar VG, Somani G, Gaur MS, Laxmi V, Conti M (2017) Androtaint: an efficient android malware detection framework using dynamic taint analysis. In: 2017 ISEA Asia Security and Privacy (ISEASP). IEEE, pp 1–13

  48. 48.

    Suarez-Tangil G, Dash SK, Ahmadi M, Kinder J, Giacinto G, Cavallaro L (2017) Droidsieve: fast and accurate classification of obfuscated android malware. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp 309–320

  49. 49.

    Tam K, Khan SJ, Fattori A, Cavallaro L (2015) Copperdroid: automatic reconstruction of android malware behaviors. In: Ndss

  50. 50.

    Xu R, Saïdi H, Anderson R (2012) Aurasium: practical policy enforcement for android applications. In: Presented as part of the 21st \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 12), pp 539–552

  51. 51.

    Yerima SY, Sezer S, McWilliams G, Muttik I (2013) A new android malware detection approach using Bayesian classification. In: 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA). IEEE, pp 121–128

  52. 52.

    Yerima SY, Sezer S, McWilliams G (2014) Analysis of Bayesian classification-based approaches for android malware detection. IET Inf Secur 8(1):25–36

    Article  Google Scholar 

  53. 53.

    Yuan Z, Lu Y, Xue Y (2016) Droiddetector: android malware characterization and detection using deep learning. Tsinghua Sci Technol 21(1):114–123

    Article  Google Scholar 

  54. 54.

    Zhang C, Wei H, Xie L, Shen Y, Zhang K (2016) Direct interval forecasting of wind speed using radial basis function neural networks in a multi-objective optimization framework. Neurocomputing 205:53–63

    Article  Google Scholar 

  55. 55.

    Zhou Y, Jiang X (2012) Android malware genome project. Disponibile a http://www.malgenomeproject.org

  56. 56.

    Zhu HJ, You ZH, Zhu ZX, Shi WL, Chen X, Cheng L (2018) Droiddet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272:638–646

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Arvind Mahindru.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mahindru, A., Sangal, A.L. HybriDroid: an empirical analysis on effective malware detection model developed using ensemble methods. J Supercomput 77, 8209–8251 (2021). https://doi.org/10.1007/s11227-020-03569-4

Download citation

Keywords

  • Permissions
  • API calls
  • Feature selection methods
  • Android apps
  • Machine learning