Hybrid classification of Android malware based on fuzzy clustering and the gradient boosting machine

Taha, Altyeb Altaher; Malebary, Sharaf Jameel

doi:10.1007/s00521-020-05450-0

Hybrid classification of Android malware based on fuzzy clustering and the gradient boosting machine

Original Article
Published: 30 October 2020

Volume 33, pages 6721–6732, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

648 Accesses
12 Citations
Explore all metrics

Abstract

The widespread use of smartphones in recent years has led to a significant rise in the sophistication and number of Android malicious applications (apps) targeting smartphone users. Android-based smartphones attract attackers more than other smartphones due to the popularity and open source features of the Android environment. Malware apps pose a potential threat to the security of Android-based smartphones because of the storage of confidential and private information. To improve the classification efficiency of Android malicious apps, this paper proposes a hybrid approach for the classification of Android malware by integrating the fuzzy C-means clustering (FCM) algorithm with the light gradient boosting machine (LightGBM). First, the fuzzy clustering method is utilized for generating highly significant clusters of the Android apps’ permissions that reflect the certain characteristics of the Android apps. The primary goal of using fuzzy clustering is to produce new features by gathering Android app permissions with related patterns together in clusters. Second, the LightGBM is employed to take the app’s permissions and their clusters resulting from FCM as inputs and outputs the classification result as a malware or goodware app after the training. The advantages of using LightGBM are its high learning efficiency and precise classification. The significance of the proposed approach is illustrated by conducting several experiments using a well-known dataset containing benign and malware Android apps. The results of the experiments show that the suggested approach outperforms the other approaches and attains the highest accuracy (94.63%), area under the curve (AUC) (98.74%) and precision (97.70%).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new approach to android malware detection using fuzzy logic-based simulated annealing and feature selection

Article 23 June 2023

Android Malicious Application Classification Using Clustering

PCSD: A Tool for Android Malware Detection

References

Kerst A, Zielasek J, Gaebel W (2020) Smartphone applications for depression: a systematic literature review and a survey of health care professionals’ attitudes towards their use in clinical practice. Eur Arch Psychiatry Clin Neurosci 270(2):139–152. https://doi.org/10.1007/s00406-018-0974-3
Article Google Scholar
Alzaylaee MK, Yerima SY, Sezer S (2020) DL-droid: deep learning based android malware detection using real devices. Comput Secur 89:101663. https://doi.org/10.1016/j.cose.2019.101663
Article Google Scholar
Altaher A (2017) An improved Android malware detection scheme based on an evolving hybrid neuro-fuzzy classifier (EHNFC) and permission-based features. Neural Comput Appl 28(12):4147–4157. https://doi.org/10.1007/s00521-016-2708-7
Article Google Scholar
Li J, Sun L, Yan Q, Li Z, Srisa-an W, Ye H (2018) Significant permission identification for machine-learning-based android malware detection. IEEE Trans Ind Inf 14(7):3216–3225. https://doi.org/10.1109/TII.2017.2789219
Article Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232. https://doi.org/10.1214/aos/1013203451
Article MathSciNet MATH Google Scholar
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W et al (2017) Lightgbm: a highly efficient gradient boosting decision tree. Advances in neural information processing systems 38:3146–3154
Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794. https://doi.org/10.1145/2939672.2939785
Dorogush AV, Ershov V, Gulin A (2018) CatBoost: gradient boosting with categorical features support. arxiv preprint arXiv:1810.11363
Pimentel BA, De Souza RM (2013) A multivariate fuzzy c-means method. Appl Soft Comput 13(4):1592–1607. https://doi.org/10.1016/j.asoc.2014.04.017
Article Google Scholar
Melin P, Castillo O (2014) A review on type-2 fuzzy logic applications in clustering, classification and pattern recognition. Appl Soft Comput 21:568–577. https://doi.org/10.1016/j.asoc.2014.04.017
Article Google Scholar
Li X, Song J, Zhang F, Ouyang X, Khan SU (2016) MapReduce-based fast fuzzy c-means algorithm for large-scale underwater image segmentation. Future Gener Comput Syst 65:90–101. https://doi.org/10.1016/j.future.2016.03.004
Article Google Scholar
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678. https://doi.org/10.1109/TNN.2005.845141
Article Google Scholar
Zhang C, Kong L, Xu Q, Zhou KB, Pan H (2020) Fault diagnosis of key components in the rotating machinery based on Fourier transform multi-filter decomposition and optimized LightGBM. Meas Sci Technol. https://doi.org/10.1088/1361-6501/aba93b
Article Google Scholar
Sadaei HJ, eSilva PCDL, Guimarães FG, Lee MH (2019) Short-term load forecasting by using a combined method of convolutional neural networks and fuzzy time series. Energy 175:365–377. https://doi.org/10.1016/j.energy.2019.03.081
Article Google Scholar
Ertuğrul ÖF, Altun Ş (2017) Developing correlations by extreme learning machine for calculating higher heating values of waste frying oils from their physical properties. Neural Comput Appl 28(11):3145–3152. https://doi.org/10.1007/s00521-016-2233-8
Article Google Scholar
Precup RE, Teban TA, Albu A, Borlea AB, Zamfirache IA, Petriu EM (2020) Evolving fuzzy models for prosthetic hand myoelectric-based control. IEEE Trans Instrum Meas. https://doi.org/10.1109/FUZZ-IEEE.2016.7737670
Article Google Scholar
Alazab M, Alazab M, Shalaginov A, Mesleh A, Awajan A (2020) Intelligent mobile malware detection using permission requests and api calls. Future Gener Comput Syst 107:509–521. https://doi.org/10.1016/j.future.2020.02.002
Article Google Scholar
Arshad S, Shah MA, Wahid A, Mehmood A, Song H, Yu H (2018) SAMADroid: a novel 3-level hybrid malware detection model for android operating system. IEEE Access 6:4321–4339. https://doi.org/10.1109/ACCESS.2018.2792941
Article Google Scholar
Aafer Y, Du WL, Yin H (2018) Droidapiminer: mining api-level features for robust malware detection in android. In: International conference on security and privacy in communication systems (Securecomm), pp 86–103. https://doi.org/10.1007/978-3-319-04283-1_6
Talha KA, Alper DI, Aydin C (2015) APK auditor: permission-based Android malware detection system. Digit Invest 13:1–14. https://doi.org/10.1016/j.diin.2015.01.001
Article Google Scholar
Zhu HJ, Jiang TH, Ma B, You ZH, Shi WL, Cheng L (2018) HEMD: a highly efficient random forest-based malware detection framework for Android. Neural Comput Appl 30(11):3353–3361. https://doi.org/10.1007/s00521-017-2914-y
Article Google Scholar
Bae C, Shin S (2016) A collaborative approach on host and network level android malware detection. Secur Commun Netw 9(18):5639–5650. https://doi.org/10.1002/sec.1723
Article Google Scholar
Sokolova K, Perez C, Lemercier M (2017) Android application classification and anomaly detection with graph-based permission patterns. Decis Support Syst 93:62–76. https://doi.org/10.1016/j.dss.2016.09.006
Article Google Scholar
Tong F, Yan Z (2017) A hybrid approach of mobile malware detection in Android. J Parallel Distrib Comput 103:22–31. https://doi.org/10.1016/j.jpdc.2016.10.012
Article Google Scholar
Zhu HJ, You ZH, Zhu ZX, Shi WL, Chen X, Cheng L (2018) DroidDet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272:638–646. https://doi.org/10.1016/j.neucom.2017.07.030
Article Google Scholar
Wang C, Xu Q, Lin X, Liu S (2019) Research on data mining of permissions mode for Android malware detection. Clust Comput 22(6):13337–13350. https://doi.org/10.1007/s10586-018-1904-x
Article Google Scholar
Amamra A, Robert JM, Abraham A, Talhi C (2016) Generative versus discriminative classifiers for android anomaly-based detection system using system calls filtering and abstraction process. Secur Commun Netw 9(16):3483–3495
Article Google Scholar
Shabtai A, Kanonov U, Elovici Y, Glezer C, Weiss Y (2012) “Andromaly”: a behavioral malware detection framework for android devices. J INTELL INF SYST 38(1):161–190. https://doi.org/10.1007/s10844-010-0148-x
Article Google Scholar
Narudin FA, Feizollah A, Anuar NB, Gani A (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Comput 20(1):343–357. https://doi.org/10.1007/s00500-014-1511-6
Article Google Scholar
Bhandari S, Panihar R, Naval S, Laxmi V, Zemmari A, Gaur MS (2018) Sword: semantic aware android malware detector. J Inf Secur Appl 42:46–56. https://doi.org/10.1016/j.jisa.2018.07.003
Article Google Scholar
Hou S, Saas A, Chen L, Ye Y (2016) Deep4maldroid: a deep learning framework for Android malware detection based on Linux kernel system call graphs. In: 2016 IEEE/WIC/ACM international conference on web intelligence workshops (WIW). IEEE, pp 104–111. https://doi.org/10.1109/WIW.2016.040
Kaya Y, Ertuğrul ÖF (2017) Gender classification from facial images using gray relational analysis with novel local binary pattern descriptors. SIViP 11(4):769–776. https://doi.org/10.1007/s11760-016-1021-3
Article Google Scholar
Borlea ID, Precup RE, Dragan F, Borlea AB (2017) Centroid update approach to K-means clustering. Adv Electr Comput Eng 17(4):3–11. https://doi.org/10.4316/AECE.2017.04001
Article Google Scholar
Zall R, Kangavari MR (2019) On the construction of multi-relational classifier based on canonical correlation analysis. Int J Artif Intell 17(2):23–43
Google Scholar
Abdulla S, Altaher A (2015) Intelligent approach for Android malware detection. KSII Trans Internet Inf 9(8):2964–2983. https://doi.org/10.3837/tiis.2015.08.012
Article Google Scholar
Altaher A, BaRukab O (2017) Android malware classification based on ANFIS with fuzzy c-means clustering using significant application permissions. Turk J Electr Eng Comput Sci 25(3):2232–2242. https://doi.org/10.3906/elk-1602-107
Article Google Scholar
Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, Siemens CERT (2014). Drebin: effective and explainable detection of android malware in your pocket. In: Ndss, vol 14, pp 23–26. https://doi.org/10.14722/ndss.2014.23247
Jović A, Brkić K, Bogunović N (2015). A review of feature selection methods with applications. In: 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, pp 1200–1205. https://doi.org/10.1109/MIPRO.2015.7160458
Nayak J, Naik B, Behera HS (2015) Fuzzy C-means (FCM) clustering algorithm a decade review from 2000 to 2014. In: Jain L, Behera H, Mandal J, Mohapatra D (eds) Computational intelligence in data mining, vol 2. Springer, New Delhi, pp 133–149. https://doi.org/10.1007/978-81-322-2208-8_14
Chapter Google Scholar
Kumar RD, Searleman AC, Swamidass SJ, Griffith OL, Bose R (2015) Statistically identifying tumor suppressors and oncogenes from pan-cancer genome-sequencing data. J Bioinform 31(22):3561–3568. https://doi.org/10.1093/bioinformatics/btv430
Article Google Scholar
Fu GH, Yi LZ, Pan J (2019) Tuning model parameters in class-imbalanced learning with precision-recall curve. Biometr J 61(3):652–664. https://doi.org/10.1002/bimj.201800148
Article MathSciNet MATH Google Scholar
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240. https://doi.org/10.1145/1143844.1143874
Dal Pozzolo A, Caelen O, Le Borgne YA, Waterschoot S, Bontempi G (2014) Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl 41(10):4915–4928. https://doi.org/10.1016/j.eswa.2014.02.026
Article Google Scholar
Peng H, Gates C, Sarma B, Li N, Qi Y, Potharaju R, Molloy I (2012) Using probabilistic generative models for ranking risks of android apps. In: Proceedings of the 2012 ACM conference on computer and communications security, pp 241–252. https://doi.org/10.1145/2382196.2382224
Huang W, Hou E, Zheng L, Feng W (2018) MixDroid: a multi-features and multi-classifiers bagging system for Android malware detection. In: AIP conference proceedings, vol. 1967, no. 1. AIP Publishing LLC, p 020015. https://doi.org/10.1063/1.5038987
Sanz B, Santos I, Laorden C, Ugarte-Pedrero X, Bringas PG, Álvarez G (2013) PUMA: permission usage to detect malware in Android. In: Herrero Á et al (eds) International joint conference CISIS’12-ICEUTE’12-SOCO’12 special sessions. Advances in intelligent systems and computing, 189th edn. Springer, Berlin, Heidelberg
Google Scholar
Bhattacharya A, Goswami RT (2017) Comparative analysis of different feature ranking techniques in data mining-based Android malware detection. In: Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications. Springer, Singapore, pp 39–49. https://doi.org/10.1007/978-981-10-3153-3_5
Nauman M, Tanveer TA, Khan S, Syed TA (2018) Deep neural architectures for large scale android malware analysis. Clust Comput 21(1):569–588. https://doi.org/10.1007/s10586-017-0944-y
Article Google Scholar

Download references

Acknowledgment

This Project was funded by the Deanship of Scientific Research (DSR), at King Abdulaziz University, Jeddah,Saudi Arabia, under Grant No. G: 12-830-1441. The authors, therefore, acknowledge with thanks DSR for technical and financial support

Author information

Authors and Affiliations

Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Rabigh, 21911, Saudi Arabia
Altyeb Altaher Taha & Sharaf Jameel Malebary

Authors

Altyeb Altaher Taha
View author publications
You can also search for this author in PubMed Google Scholar
Sharaf Jameel Malebary
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Altyeb Altaher Taha.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Taha, A.A., Malebary, S.J. Hybrid classification of Android malware based on fuzzy clustering and the gradient boosting machine. Neural Comput & Applic 33, 6721–6732 (2021). https://doi.org/10.1007/s00521-020-05450-0

Download citation

Received: 20 May 2020
Accepted: 14 October 2020
Published: 30 October 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00521-020-05450-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid classification of Android malware based on fuzzy clustering and the gradient boosting machine

Abstract

Access this article

Similar content being viewed by others

A new approach to android malware detection using fuzzy logic-based simulated annealing and feature selection

Android Malicious Application Classification Using Clustering

PCSD: A Tool for Android Malware Detection

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid classification of Android malware based on fuzzy clustering and the gradient boosting machine

Abstract

Access this article

Similar content being viewed by others

A new approach to android malware detection using fuzzy logic-based simulated annealing and feature selection

Android Malicious Application Classification Using Clustering

PCSD: A Tool for Android Malware Detection

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation