Abstract
The widespread use of smartphones in recent years has led to a significant rise in the sophistication and number of Android malicious applications (apps) targeting smartphone users. Android-based smartphones attract attackers more than other smartphones due to the popularity and open source features of the Android environment. Malware apps pose a potential threat to the security of Android-based smartphones because of the storage of confidential and private information. To improve the classification efficiency of Android malicious apps, this paper proposes a hybrid approach for the classification of Android malware by integrating the fuzzy C-means clustering (FCM) algorithm with the light gradient boosting machine (LightGBM). First, the fuzzy clustering method is utilized for generating highly significant clusters of the Android apps’ permissions that reflect the certain characteristics of the Android apps. The primary goal of using fuzzy clustering is to produce new features by gathering Android app permissions with related patterns together in clusters. Second, the LightGBM is employed to take the app’s permissions and their clusters resulting from FCM as inputs and outputs the classification result as a malware or goodware app after the training. The advantages of using LightGBM are its high learning efficiency and precise classification. The significance of the proposed approach is illustrated by conducting several experiments using a well-known dataset containing benign and malware Android apps. The results of the experiments show that the suggested approach outperforms the other approaches and attains the highest accuracy (94.63%), area under the curve (AUC) (98.74%) and precision (97.70%).
Similar content being viewed by others
References
Kerst A, Zielasek J, Gaebel W (2020) Smartphone applications for depression: a systematic literature review and a survey of health care professionals’ attitudes towards their use in clinical practice. Eur Arch Psychiatry Clin Neurosci 270(2):139–152. https://doi.org/10.1007/s00406-018-0974-3
Alzaylaee MK, Yerima SY, Sezer S (2020) DL-droid: deep learning based android malware detection using real devices. Comput Secur 89:101663. https://doi.org/10.1016/j.cose.2019.101663
Altaher A (2017) An improved Android malware detection scheme based on an evolving hybrid neuro-fuzzy classifier (EHNFC) and permission-based features. Neural Comput Appl 28(12):4147–4157. https://doi.org/10.1007/s00521-016-2708-7
Li J, Sun L, Yan Q, Li Z, Srisa-an W, Ye H (2018) Significant permission identification for machine-learning-based android malware detection. IEEE Trans Ind Inf 14(7):3216–3225. https://doi.org/10.1109/TII.2017.2789219
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232. https://doi.org/10.1214/aos/1013203451
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W et al (2017) Lightgbm: a highly efficient gradient boosting decision tree. Advances in neural information processing systems 38:3146–3154
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794. https://doi.org/10.1145/2939672.2939785
Dorogush AV, Ershov V, Gulin A (2018) CatBoost: gradient boosting with categorical features support. arxiv preprint arXiv:1810.11363
Pimentel BA, De Souza RM (2013) A multivariate fuzzy c-means method. Appl Soft Comput 13(4):1592–1607. https://doi.org/10.1016/j.asoc.2014.04.017
Melin P, Castillo O (2014) A review on type-2 fuzzy logic applications in clustering, classification and pattern recognition. Appl Soft Comput 21:568–577. https://doi.org/10.1016/j.asoc.2014.04.017
Li X, Song J, Zhang F, Ouyang X, Khan SU (2016) MapReduce-based fast fuzzy c-means algorithm for large-scale underwater image segmentation. Future Gener Comput Syst 65:90–101. https://doi.org/10.1016/j.future.2016.03.004
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678. https://doi.org/10.1109/TNN.2005.845141
Zhang C, Kong L, Xu Q, Zhou KB, Pan H (2020) Fault diagnosis of key components in the rotating machinery based on Fourier transform multi-filter decomposition and optimized LightGBM. Meas Sci Technol. https://doi.org/10.1088/1361-6501/aba93b
Sadaei HJ, eSilva PCDL, Guimarães FG, Lee MH (2019) Short-term load forecasting by using a combined method of convolutional neural networks and fuzzy time series. Energy 175:365–377. https://doi.org/10.1016/j.energy.2019.03.081
Ertuğrul ÖF, Altun Ş (2017) Developing correlations by extreme learning machine for calculating higher heating values of waste frying oils from their physical properties. Neural Comput Appl 28(11):3145–3152. https://doi.org/10.1007/s00521-016-2233-8
Precup RE, Teban TA, Albu A, Borlea AB, Zamfirache IA, Petriu EM (2020) Evolving fuzzy models for prosthetic hand myoelectric-based control. IEEE Trans Instrum Meas. https://doi.org/10.1109/FUZZ-IEEE.2016.7737670
Alazab M, Alazab M, Shalaginov A, Mesleh A, Awajan A (2020) Intelligent mobile malware detection using permission requests and api calls. Future Gener Comput Syst 107:509–521. https://doi.org/10.1016/j.future.2020.02.002
Arshad S, Shah MA, Wahid A, Mehmood A, Song H, Yu H (2018) SAMADroid: a novel 3-level hybrid malware detection model for android operating system. IEEE Access 6:4321–4339. https://doi.org/10.1109/ACCESS.2018.2792941
Aafer Y, Du WL, Yin H (2018) Droidapiminer: mining api-level features for robust malware detection in android. In: International conference on security and privacy in communication systems (Securecomm), pp 86–103. https://doi.org/10.1007/978-3-319-04283-1_6
Talha KA, Alper DI, Aydin C (2015) APK auditor: permission-based Android malware detection system. Digit Invest 13:1–14. https://doi.org/10.1016/j.diin.2015.01.001
Zhu HJ, Jiang TH, Ma B, You ZH, Shi WL, Cheng L (2018) HEMD: a highly efficient random forest-based malware detection framework for Android. Neural Comput Appl 30(11):3353–3361. https://doi.org/10.1007/s00521-017-2914-y
Bae C, Shin S (2016) A collaborative approach on host and network level android malware detection. Secur Commun Netw 9(18):5639–5650. https://doi.org/10.1002/sec.1723
Sokolova K, Perez C, Lemercier M (2017) Android application classification and anomaly detection with graph-based permission patterns. Decis Support Syst 93:62–76. https://doi.org/10.1016/j.dss.2016.09.006
Tong F, Yan Z (2017) A hybrid approach of mobile malware detection in Android. J Parallel Distrib Comput 103:22–31. https://doi.org/10.1016/j.jpdc.2016.10.012
Zhu HJ, You ZH, Zhu ZX, Shi WL, Chen X, Cheng L (2018) DroidDet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272:638–646. https://doi.org/10.1016/j.neucom.2017.07.030
Wang C, Xu Q, Lin X, Liu S (2019) Research on data mining of permissions mode for Android malware detection. Clust Comput 22(6):13337–13350. https://doi.org/10.1007/s10586-018-1904-x
Amamra A, Robert JM, Abraham A, Talhi C (2016) Generative versus discriminative classifiers for android anomaly-based detection system using system calls filtering and abstraction process. Secur Commun Netw 9(16):3483–3495
Shabtai A, Kanonov U, Elovici Y, Glezer C, Weiss Y (2012) “Andromaly”: a behavioral malware detection framework for android devices. J INTELL INF SYST 38(1):161–190. https://doi.org/10.1007/s10844-010-0148-x
Narudin FA, Feizollah A, Anuar NB, Gani A (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Comput 20(1):343–357. https://doi.org/10.1007/s00500-014-1511-6
Bhandari S, Panihar R, Naval S, Laxmi V, Zemmari A, Gaur MS (2018) Sword: semantic aware android malware detector. J Inf Secur Appl 42:46–56. https://doi.org/10.1016/j.jisa.2018.07.003
Hou S, Saas A, Chen L, Ye Y (2016) Deep4maldroid: a deep learning framework for Android malware detection based on Linux kernel system call graphs. In: 2016 IEEE/WIC/ACM international conference on web intelligence workshops (WIW). IEEE, pp 104–111. https://doi.org/10.1109/WIW.2016.040
Kaya Y, Ertuğrul ÖF (2017) Gender classification from facial images using gray relational analysis with novel local binary pattern descriptors. SIViP 11(4):769–776. https://doi.org/10.1007/s11760-016-1021-3
Borlea ID, Precup RE, Dragan F, Borlea AB (2017) Centroid update approach to K-means clustering. Adv Electr Comput Eng 17(4):3–11. https://doi.org/10.4316/AECE.2017.04001
Zall R, Kangavari MR (2019) On the construction of multi-relational classifier based on canonical correlation analysis. Int J Artif Intell 17(2):23–43
Abdulla S, Altaher A (2015) Intelligent approach for Android malware detection. KSII Trans Internet Inf 9(8):2964–2983. https://doi.org/10.3837/tiis.2015.08.012
Altaher A, BaRukab O (2017) Android malware classification based on ANFIS with fuzzy c-means clustering using significant application permissions. Turk J Electr Eng Comput Sci 25(3):2232–2242. https://doi.org/10.3906/elk-1602-107
Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, Siemens CERT (2014). Drebin: effective and explainable detection of android malware in your pocket. In: Ndss, vol 14, pp 23–26. https://doi.org/10.14722/ndss.2014.23247
Jović A, Brkić K, Bogunović N (2015). A review of feature selection methods with applications. In: 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, pp 1200–1205. https://doi.org/10.1109/MIPRO.2015.7160458
Nayak J, Naik B, Behera HS (2015) Fuzzy C-means (FCM) clustering algorithm a decade review from 2000 to 2014. In: Jain L, Behera H, Mandal J, Mohapatra D (eds) Computational intelligence in data mining, vol 2. Springer, New Delhi, pp 133–149. https://doi.org/10.1007/978-81-322-2208-8_14
Kumar RD, Searleman AC, Swamidass SJ, Griffith OL, Bose R (2015) Statistically identifying tumor suppressors and oncogenes from pan-cancer genome-sequencing data. J Bioinform 31(22):3561–3568. https://doi.org/10.1093/bioinformatics/btv430
Fu GH, Yi LZ, Pan J (2019) Tuning model parameters in class-imbalanced learning with precision-recall curve. Biometr J 61(3):652–664. https://doi.org/10.1002/bimj.201800148
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, pp 233–240. https://doi.org/10.1145/1143844.1143874
Dal Pozzolo A, Caelen O, Le Borgne YA, Waterschoot S, Bontempi G (2014) Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl 41(10):4915–4928. https://doi.org/10.1016/j.eswa.2014.02.026
Peng H, Gates C, Sarma B, Li N, Qi Y, Potharaju R, Molloy I (2012) Using probabilistic generative models for ranking risks of android apps. In: Proceedings of the 2012 ACM conference on computer and communications security, pp 241–252. https://doi.org/10.1145/2382196.2382224
Huang W, Hou E, Zheng L, Feng W (2018) MixDroid: a multi-features and multi-classifiers bagging system for Android malware detection. In: AIP conference proceedings, vol. 1967, no. 1. AIP Publishing LLC, p 020015. https://doi.org/10.1063/1.5038987
Sanz B, Santos I, Laorden C, Ugarte-Pedrero X, Bringas PG, Álvarez G (2013) PUMA: permission usage to detect malware in Android. In: Herrero Á et al (eds) International joint conference CISIS’12-ICEUTE’12-SOCO’12 special sessions. Advances in intelligent systems and computing, 189th edn. Springer, Berlin, Heidelberg
Bhattacharya A, Goswami RT (2017) Comparative analysis of different feature ranking techniques in data mining-based Android malware detection. In: Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications. Springer, Singapore, pp 39–49. https://doi.org/10.1007/978-981-10-3153-3_5
Nauman M, Tanveer TA, Khan S, Syed TA (2018) Deep neural architectures for large scale android malware analysis. Clust Comput 21(1):569–588. https://doi.org/10.1007/s10586-017-0944-y
Acknowledgment
This Project was funded by the Deanship of Scientific Research (DSR), at King Abdulaziz University, Jeddah,Saudi Arabia, under Grant No. G: 12-830-1441. The authors, therefore, acknowledge with thanks DSR for technical and financial support
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Taha, A.A., Malebary, S.J. Hybrid classification of Android malware based on fuzzy clustering and the gradient boosting machine. Neural Comput & Applic 33, 6721–6732 (2021). https://doi.org/10.1007/s00521-020-05450-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05450-0