Skip to main content
Log in

Hybrid classification of Android malware based on fuzzy clustering and the gradient boosting machine

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The widespread use of smartphones in recent years has led to a significant rise in the sophistication and number of Android malicious applications (apps) targeting smartphone users. Android-based smartphones attract attackers more than other smartphones due to the popularity and open source features of the Android environment. Malware apps pose a potential threat to the security of Android-based smartphones because of the storage of confidential and private information. To improve the classification efficiency of Android malicious apps, this paper proposes a hybrid approach for the classification of Android malware by integrating the fuzzy C-means clustering (FCM) algorithm with the light gradient boosting machine (LightGBM). First, the fuzzy clustering method is utilized for generating highly significant clusters of the Android apps’ permissions that reflect the certain characteristics of the Android apps. The primary goal of using fuzzy clustering is to produce new features by gathering Android app permissions with related patterns together in clusters. Second, the LightGBM is employed to take the app’s permissions and their clusters resulting from FCM as inputs and outputs the classification result as a malware or goodware app after the training. The advantages of using LightGBM are its high learning efficiency and precise classification. The significance of the proposed approach is illustrated by conducting several experiments using a well-known dataset containing benign and malware Android apps. The results of the experiments show that the suggested approach outperforms the other approaches and attains the highest accuracy (94.63%), area under the curve (AUC) (98.74%) and precision (97.70%).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Kerst A, Zielasek J, Gaebel W (2020) Smartphone applications for depression: a systematic literature review and a survey of health care professionals’ attitudes towards their use in clinical practice. Eur Arch Psychiatry Clin Neurosci 270(2):139–152. https://doi.org/10.1007/s00406-018-0974-3

    Article  Google Scholar 

  2. Alzaylaee MK, Yerima SY, Sezer S (2020) DL-droid: deep learning based android malware detection using real devices. Comput Secur 89:101663. https://doi.org/10.1016/j.cose.2019.101663

    Article  Google Scholar 

  3. Altaher A (2017) An improved Android malware detection scheme based on an evolving hybrid neuro-fuzzy classifier (EHNFC) and permission-based features. Neural Comput Appl 28(12):4147–4157. https://doi.org/10.1007/s00521-016-2708-7

    Article  Google Scholar 

  4. Li J, Sun L, Yan Q, Li Z, Srisa-an W, Ye H (2018) Significant permission identification for machine-learning-based android malware detection. IEEE Trans Ind Inf 14(7):3216–3225. https://doi.org/10.1109/TII.2017.2789219

    Article  Google Scholar 

  5. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232. https://doi.org/10.1214/aos/1013203451

    Article  MathSciNet  MATH  Google Scholar 

  6. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W et al (2017) Lightgbm: a highly efficient gradient boosting decision tree. Advances in neural information processing systems 38:3146–3154

    Google Scholar 

  7. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794. https://doi.org/10.1145/2939672.2939785

  8. Dorogush AV, Ershov V, Gulin A (2018) CatBoost: gradient boosting with categorical features support. arxiv preprint arXiv:1810.11363

  9. Pimentel BA, De Souza RM (2013) A multivariate fuzzy c-means method. Appl Soft Comput 13(4):1592–1607. https://doi.org/10.1016/j.asoc.2014.04.017

    Article  Google Scholar 

  10. Melin P, Castillo O (2014) A review on type-2 fuzzy logic applications in clustering, classification and pattern recognition. Appl Soft Comput 21:568–577. https://doi.org/10.1016/j.asoc.2014.04.017

    Article  Google Scholar 

  11. Li X, Song J, Zhang F, Ouyang X, Khan SU (2016) MapReduce-based fast fuzzy c-means algorithm for large-scale underwater image segmentation. Future Gener Comput Syst 65:90–101. https://doi.org/10.1016/j.future.2016.03.004

    Article  Google Scholar 

  12. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678. https://doi.org/10.1109/TNN.2005.845141

    Article  Google Scholar 

  13. Zhang C, Kong L, Xu Q, Zhou KB, Pan H (2020) Fault diagnosis of key components in the rotating machinery based on Fourier transform multi-filter decomposition and optimized LightGBM. Meas Sci Technol. https://doi.org/10.1088/1361-6501/aba93b

    Article  Google Scholar 

  14. Sadaei HJ, eSilva PCDL, Guimarães FG, Lee MH (2019) Short-term load forecasting by using a combined method of convolutional neural networks and fuzzy time series. Energy 175:365–377. https://doi.org/10.1016/j.energy.2019.03.081

    Article  Google Scholar 

  15. Ertuğrul ÖF, Altun Ş (2017) Developing correlations by extreme learning machine for calculating higher heating values of waste frying oils from their physical properties. Neural Comput Appl 28(11):3145–3152. https://doi.org/10.1007/s00521-016-2233-8

    Article  Google Scholar 

  16. Precup RE, Teban TA, Albu A, Borlea AB, Zamfirache IA, Petriu EM (2020) Evolving fuzzy models for prosthetic hand myoelectric-based control. IEEE Trans Instrum Meas. https://doi.org/10.1109/FUZZ-IEEE.2016.7737670

    Article  Google Scholar 

  17. Alazab M, Alazab M, Shalaginov A, Mesleh A, Awajan A (2020) Intelligent mobile malware detection using permission requests and api calls. Future Gener Comput Syst 107:509–521. https://doi.org/10.1016/j.future.2020.02.002

    Article  Google Scholar 

  18. Arshad S, Shah MA, Wahid A, Mehmood A, Song H, Yu H (2018) SAMADroid: a novel 3-level hybrid malware detection model for android operating system. IEEE Access 6:4321–4339. https://doi.org/10.1109/ACCESS.2018.2792941

    Article  Google Scholar 

  19. Aafer Y, Du WL, Yin H (2018) Droidapiminer: mining api-level features for robust malware detection in android. In: International conference on security and privacy in communication systems (Securecomm), pp 86–103. https://doi.org/10.1007/978-3-319-04283-1_6

  20. Talha KA, Alper DI, Aydin C (2015) APK auditor: permission-based Android malware detection system. Digit Invest 13:1–14. https://doi.org/10.1016/j.diin.2015.01.001

    Article  Google Scholar 

  21. Zhu HJ, Jiang TH, Ma B, You ZH, Shi WL, Cheng L (2018) HEMD: a highly efficient random forest-based malware detection framework for Android. Neural Comput Appl 30(11):3353–3361. https://doi.org/10.1007/s00521-017-2914-y

    Article  Google Scholar 

  22. Bae C, Shin S (2016) A collaborative approach on host and network level android malware detection. Secur Commun Netw 9(18):5639–5650. https://doi.org/10.1002/sec.1723

    Article  Google Scholar 

  23. Sokolova K, Perez C, Lemercier M (2017) Android application classification and anomaly detection with graph-based permission patterns. Decis Support Syst 93:62–76. https://doi.org/10.1016/j.dss.2016.09.006

    Article  Google Scholar 

  24. Tong F, Yan Z (2017) A hybrid approach of mobile malware detection in Android. J Parallel Distrib Comput 103:22–31. https://doi.org/10.1016/j.jpdc.2016.10.012

    Article  Google Scholar 

  25. Zhu HJ, You ZH, Zhu ZX, Shi WL, Chen X, Cheng L (2018) DroidDet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272:638–646. https://doi.org/10.1016/j.neucom.2017.07.030

    Article  Google Scholar 

  26. Wang C, Xu Q, Lin X, Liu S (2019) Research on data mining of permissions mode for Android malware detection. Clust Comput 22(6):13337–13350. https://doi.org/10.1007/s10586-018-1904-x

    Article  Google Scholar 

  27. Amamra A, Robert JM, Abraham A, Talhi C (2016) Generative versus discriminative classifiers for android anomaly-based detection system using system calls filtering and abstraction process. Secur Commun Netw 9(16):3483–3495

    Article  Google Scholar 

  28. Shabtai A, Kanonov U, Elovici Y, Glezer C, Weiss Y (2012) “Andromaly”: a behavioral malware detection framework for android devices. J INTELL INF SYST 38(1):161–190. https://doi.org/10.1007/s10844-010-0148-x

    Article  Google Scholar 

  29. Narudin FA, Feizollah A, Anuar NB, Gani A (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Comput 20(1):343–357. https://doi.org/10.1007/s00500-014-1511-6

    Article  Google Scholar 

  30. Bhandari S, Panihar R, Naval S, Laxmi V, Zemmari A, Gaur MS (2018) Sword: semantic aware android malware detector. J Inf Secur Appl 42:46–56. https://doi.org/10.1016/j.jisa.2018.07.003

    Article  Google Scholar 

  31. Hou S, Saas A, Chen L, Ye Y (2016) Deep4maldroid: a deep learning framework for Android malware detection based on Linux kernel system call graphs. In: 2016 IEEE/WIC/ACM international conference on web intelligence workshops (WIW). IEEE, pp 104–111. https://doi.org/10.1109/WIW.2016.040

  32. Kaya Y, Ertuğrul ÖF (2017) Gender classification from facial images using gray relational analysis with novel local binary pattern descriptors. SIViP 11(4):769–776. https://doi.org/10.1007/s11760-016-1021-3

    Article  Google Scholar 

  33. Borlea ID, Precup RE, Dragan F, Borlea AB (2017) Centroid update approach to K-means clustering. Adv Electr Comput Eng 17(4):3–11. https://doi.org/10.4316/AECE.2017.04001

    Article  Google Scholar 

  34. Zall R, Kangavari MR (2019) On the construction of multi-relational classifier based on canonical correlation analysis. Int J Artif Intell 17(2):23–43

    Google Scholar 

  35. Abdulla S, Altaher A (2015) Intelligent approach for Android malware detection. KSII Trans Internet Inf 9(8):2964–2983. https://doi.org/10.3837/tiis.2015.08.012

    Article  Google Scholar 

  36. Altaher A, BaRukab O (2017) Android malware classification based on ANFIS with fuzzy c-means clustering using significant application permissions. Turk J Electr Eng Comput Sci 25(3):2232–2242. https://doi.org/10.3906/elk-1602-107

    Article  Google Scholar 

  37. Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, Siemens CERT (2014). Drebin: effective and explainable detection of android malware in your pocket. In: Ndss, vol 14, pp 23–26. https://doi.org/10.14722/ndss.2014.23247

  38. Jović A, Brkić K, Bogunović N (2015). A review of feature selection methods with applications. In: 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, pp 1200–1205. https://doi.org/10.1109/MIPRO.2015.7160458

  39. Nayak J, Naik B, Behera HS (2015) Fuzzy C-means (FCM) clustering algorithm a decade review from 2000 to 2014. In: Jain L, Behera H, Mandal J, Mohapatra D (eds) Computational intelligence in data mining, vol 2. Springer, New Delhi, pp 133–149. https://doi.org/10.1007/978-81-322-2208-8_14

    Chapter  Google Scholar 

  40. Kumar RD, Searleman AC, Swamidass SJ, Griffith OL, Bose R (2015) Statistically identifying tumor suppressors and oncogenes from pan-cancer genome-sequencing data. J Bioinform 31(22):3561–3568. https://doi.org/10.1093/bioinformatics/btv430

    Article  Google Scholar 

  41. Fu GH, Yi LZ, Pan J (2019) Tuning model parameters in class-imbalanced learning with precision-recall curve. Biometr J 61(3):652–664. https://doi.org/10.1002/bimj.201800148

    Article  MathSciNet  MATH  Google Scholar 

  42. Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240. https://doi.org/10.1145/1143844.1143874

  43. Dal Pozzolo A, Caelen O, Le Borgne YA, Waterschoot S, Bontempi G (2014) Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl 41(10):4915–4928. https://doi.org/10.1016/j.eswa.2014.02.026

    Article  Google Scholar 

  44. Peng H, Gates C, Sarma B, Li N, Qi Y, Potharaju R, Molloy I (2012) Using probabilistic generative models for ranking risks of android apps. In: Proceedings of the 2012 ACM conference on computer and communications security, pp 241–252. https://doi.org/10.1145/2382196.2382224

  45. Huang W, Hou E, Zheng L, Feng W (2018) MixDroid: a multi-features and multi-classifiers bagging system for Android malware detection. In: AIP conference proceedings, vol. 1967, no. 1. AIP Publishing LLC, p 020015. https://doi.org/10.1063/1.5038987

  46. Sanz B, Santos I, Laorden C, Ugarte-Pedrero X, Bringas PG, Álvarez G (2013) PUMA: permission usage to detect malware in Android. In: Herrero Á et al (eds) International joint conference CISIS’12-ICEUTE’12-SOCO’12 special sessions. Advances in intelligent systems and computing, 189th edn. Springer, Berlin, Heidelberg

    Google Scholar 

  47. Bhattacharya A, Goswami RT (2017) Comparative analysis of different feature ranking techniques in data mining-based Android malware detection. In: Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications. Springer, Singapore, pp 39–49. https://doi.org/10.1007/978-981-10-3153-3_5

  48. Nauman M, Tanveer TA, Khan S, Syed TA (2018) Deep neural architectures for large scale android malware analysis. Clust Comput 21(1):569–588. https://doi.org/10.1007/s10586-017-0944-y

    Article  Google Scholar 

Download references

Acknowledgment

This Project was funded by the Deanship of Scientific Research (DSR), at King Abdulaziz University, Jeddah,Saudi Arabia, under Grant No. G: 12-830-1441. The authors, therefore, acknowledge with thanks DSR for technical and financial support

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Altyeb Altaher Taha.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Taha, A.A., Malebary, S.J. Hybrid classification of Android malware based on fuzzy clustering and the gradient boosting machine. Neural Comput & Applic 33, 6721–6732 (2021). https://doi.org/10.1007/s00521-020-05450-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05450-0

Keywords

Navigation