Abstract
This research paper presents MLDroid—a web-based framework—which helps to detect malware from Android devices. Due to increase in the popularity of Android devices, malware developers develop malware on daily basis to threaten the system integrity and user’s privacy. The proposed framework detects malware from Android apps by performing its dynamic analysis. To detect malware from real-world apps, we trained our proposed framework by selecting features which are gained by implementing feature selection approaches. Further, these selected features help to build a model by considering different machine learning algorithms. Experiment was performed on 5,00,000 plus Android apps. Empirical result reveals that model developed by considering all the four distinct machine learning algorithms parallelly (i.e., deep learning algorithm, farthest first clustering, Y-MLP and nonlinear ensemble decision tree forest approach) and rough set analysis as a feature subset selection algorithm achieved the highest detection rate of 98.8% to detect malware from real-world apps.
Similar content being viewed by others
Notes
Virus Total used to identify the class.
Categories of classes are identified by using Virus Total antivirus scanner.
In our study, we cannot consider all extracted features while training with supervised machine algorithms as it gives less accuracy at the time of testing.
On the basis of Tables 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 and box-plot diagram, we select best machine learning algorithm to build our web-based malware detection model. Detection of malware families using MLDroid is seen in "Appendix B."
In our study, we consider sample of 81 different malware families their names with samples are mentioned in "Appendix C."
References
Singh KU, Gupta PK, Ghrera SP (2015) Performance evaluation of AOMDV routing algorithm with local repair for wireless mesh networks. CSI Trans ICT 2(4):253–260
https://developer.android.com/guide/topics/permissions/overview
https://safeguarde.com/mobile-apps-stealing-your-information/
Novakovic J (2010) The impact of feature selection on the accuracy of Naïve Bayes classifier. In: 18th Telecommunications forum TELFOR, vol 2, pp 1113–1116
Plackett RL (1983) Karl Pearson and the chi-squared test. Int Stat Rev/Revue Int Stat 51(1):59–72
Wang W, Wang X, Feng D, Liu J, Han Z, Zhang X (2014) Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans Inf Forens Secur 9(11):1869–1882
Cruz C, Erika A, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proceedings of the 2009 3rd international symposium on empirical software engineering and measurement. IEEE Computer Society, pp 460–463
Hall MA (1999) Correlation-based feature selection for machine learning (Doctoral dissertation, The University of Waikato, Dept. of Computer Science)
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif intell 151(1–2):155–176
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif intell 97(1–2):273–324
Arp D, Michael S, Malte H, Hugo G, Konrad R, Siemens CERT (2014) Drebin: effective and explainable detection of android malware in your pocket. In: NDSS, vol 14, pp 23–26
Cui B, Jin H, Carullo G, Liu Z (2015) Service-oriented mobile malware detection system based on mining strategies. Pervasive Mob Comput 24:101–116
Enck W, Ongtang M, McDaniel P (2009) On lightweight mobile phone application certification. In: Proceedings of the 16th ACM conference on Computer and communications security. ACM, pp 235–245
Narudin FA, Ali F, Nor BA, Abdullah G (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Comput 20(1):343–357
Wei T-E, Mao C-H, Jeng AB, Lee H-M, Wang H-T, Wu D-J (2012) Android malware detection via a latent network behavior analysis. In: 2012 IEEE 11th international conference on trust, security and privacy in computing and communications. IEEE, pp 1251–1258
El Attar A, Khatoun R, Lemercier M (2014) A Gaussian mixture model for dynamic detection of abnormal behavior in smartphone applications. In: 2014 global information infrastructure and networking symposium (GIIS). IEEE, pp 1–6
Dixon B, Mishra S (2013) Power based malicious code detection techniques for smartphones. In: 2013 12th IEEE international conference on trust, security and privacy in computing and communications. IEEE, pp 142–149
Suarez-Tangil G, Tapiador JE, Peris-Lopez P, Pastrana S (2015) Power-aware anomaly detection in smartphones: an analysis of on-platform versus externalized operation. Pervasive Mob Comput 18:137–151
Chen PS, Lin S-C, Sun C-H (2015) Simple and effective method for detecting abnormal internet behaviors of mobile devices. Inf Sci 321:193–204
Quan D, Zhai L, Yang F, Wang P (2014) Detection of android malicious apps based on the sensitive behaviors. In: 2014 IEEE 13th international conference on trust, security and privacy in computing and communications. IEEE, pp 877–883
Ng DV, Hwang J-IG (2014) Hwang. Android malware detection using the dendritic cell algorithm. In: 2014 International conference on machine learning and cybernetics, vol 1. IEEE, pp 257–262
Sheen S, Anitha R, Natarajan V (2015) Android based malware detection using a multifeature collaborative decision fusion approach. Neurocomputing 151:905–912
Tong F, Yan Z (2017) A hybrid approach of mobile malware detection in Android. J Parallel Distrib Comput 103:22–31
Fung CJ, Lam DY, Boutaba R (2014) Revmatch: an efficient and robust decision model for collaborative malware detection. In: 2014 IEEE network operations and management symposium (NOMS). IEEE, pp 1–9
Shone N, Tran NN, Vu DP, Qi S (2018) A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intell 2(1):41–50
Shiyong Y, Jinsong B, Yiming Z, Xiaodi H (2017) M2m security technology of cps based on blockchains. Symmetry 9(9):193
Abawajy J, Kelarev A (2017) Iterative classifier fusion system for the detection of Android malware. IEEE Trans Big Data 5(3):282–292
Guo D-F, Sui A-F, Shi Y-J, Hu J-J, Lin G-Z, Tao G (2014) Behavior classification based self-learning mobile malware detection. JCP 9(4):851–858
Tramontana E, Verga G (2019) Mitigating privacy-related risks for android users. In: 2019 IEEE 28th international conference on enabling technologies: infrastructure for collaborative enterprises (WETICE). IEEE, pp 243–248
Enck W, Gilbert P, Han S, Tendulkar V, Chun B-G, Cox LP, Jung J, McDaniel P, Sheth AN (2014) TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans Comput Syst (TOCS) 32(2):1–29
Motiur RSSM, Kumar SS (2018) StackDroid: Evaluation of a Multi-level Approach for Detecting the Malware on Android Using Stacked Generalization. International Conference on Recent Trends in Image Processing and Pattern Recognition. Springer, Singapore, pp 611–623
Barrera D, Güneş KH, Van Oorschot PC, Somayaji A (2010) A methodology for empirical analysis of permission-based security models and its application to android. In: Proceedings of the 17th ACM conference on computer and communications security. ACM, pp 73–84
Yang L, Ganapathy V, Iftode L (2011) Enhancing mobile malware detection with social collaboration. In: 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing. IEEE, pp 572–576
Rahman M (2013) Droidmln: a Markov logic network approach to detect android malware. In: 2013 12th International conference on machine learning and applications, vol 2. IEEE, pp 166–169
Alam MS, Vuong ST (2013) Random forest classification for detecting android malware. In: 2013 IEEE international conference on green computing and communications and IEEE Internet of Things and IEEE cyber, physical and social computing. IEEE, pp 663–669
Amos B, Turner H, White J (2013) Applying machine learning classifiers to dynamic android malware detection at scale. In: 2013 9th international wireless communications and mobile computing conference (IWCMC). IEEE, pp 1666–1671
Shen T, Zhongyang Y, Xin Z, Mao B, Huang H (2014) Detect Android malware variants using component based topology graph. In: 2014 IEEE 13th international conference on trust, security and privacy in computing and communications. IEEE, pp 406–413
Almin SB, Madhumita C (2015) A novel approach to detect android malware. Procedia Comput Sci 45:407–417
Andriatsimandefitra R, Tong VVT (2015) Detection and identification of Android malware based on information flow monitoring. In: 2015 IEEE 2nd International conference on cyber security and cloud computing. IEEE, pp 200–203
Caviglione L, Mauro G, Jean-Franşois L, Wojciech M, Marcin U (2015) Seeing the unseen: revealing mobile malware hidden communications via energy consumption and artificial intelligence. IEEE Trans Inf Forensics Secur 11(4):799–810
Holland B, Deering T, Kothari S, Mathews J, Ranade N (2015) Security toolbox for detecting novel and sophisticated android malware. In: Proceedings of the 37th international conference on software engineering, vol 2. IEEE Press, pp 733–736
Martinelli F, Mercaldo F, Saracino A (2017) Bridemaid: an hybrid tool for accurate detection of android malware. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security. ACM, pp 899–901
Kadir AFA, Natalia S, Ghorbani GA (2015) Android botnets: what urls are telling us. In: International conference on network and system security. Springer, Cham, pp 78–91
Zhou Y, Jiang X (2012) Dissecting android malware: characterization and evolution. In: 2012 IEEE symposium on security and privacy. IEEE, pp 95–109
Aafer Y, Du W, Yin H (2013) Droidapiminer: mining api-level features for robust malware detection in android. In: International conference on security and privacy in communication systems. Springer, Cham, pp 86–103
Wu D-J, Mao C-H, Wei T-E, Lee H-M, Wu K-P (2012) Droidmat: Android malware detection through manifest and api calls tracing. In: 2012 Seventh Asia joint conference on information security. IEEE, pp 62–69
Mohammadi S, Mirvaziri H, Ghazizadeh-Ahsaee M, Karimipour H (2019) Cyber intrusion detection by combined feature selection algorithm. J Inf Secur Appl 44:80–88
Shabtai A, Kanonov U, Elovici Y, Glezer C, Weiss Y (2012) “Andromaly”: a behavioral malware detection framework for android devices. J Intell Inf Syst 1:161–190
Igor S, Borja S, Carlos L, Felix B, Bringas PG (2011) Opcode-sequence-based semi-supervised unknown malware detection. In: Computational intelligence in security for information systems. Springer, Berlin, Heidelberg, pp 50–57
Canbek G, Baykal N, Sagiroglu S (2017) Clustering and visualization of mobile application permissions for end users and malware analysts. In: 2017 5th international symposium on digital forensic and security (ISDFS). IEEE, pp 1–10
Burguera I, Zurutuza U, Nadjm-Tehrani S (2011) Crowdroid: behavior-based malware detection system for android. In: Proceedings of the 1st ACM workshop on security and privacy in smartphones and mobile devices. ACM, pp 15–26
Chuang H-Y, Wang S-D (2015) Machine learning based hybrid behavior models for Android malware analysis. In: 2015 IEEE international conference on software quality, reliability and security. IEEE, pp 201–206
Mahindru A, Singh P (2017) Dynamic permissions based android malware detection using machine learning techniques. In: Proceedings of the 10th innovations in software engineering conference. ACM, pp 202–210
Borja S, Igor S, Carlos L, Xabier U-P, Garcia BP, Gonzalo Á (2013) Puma: permission usage to detect malware in android. In: International joint conference CISIS’12-ICEUTE 12-SOCO 12 special sessions. Springer, Berlin, Heidelberg, pp 289–298
Mas’ ud MZ, Sahib S, Abdollah MF, Selamat SR, Yusof R (2014) Analysis of features selection and machine learning classifier in android malware detection. In: 2014 International conference on information science & applications (ICISA). IEEE, pp 1–5
Chun-Ying H, Yi-Ting T, Chung-Han H (2013) Performance evaluation on permission-based detection for android malware. Advances in intelligent systems and applications-volume 2. Springer, Berlin, pp 111–120
Yuan Z, Lu Y, Wang Z, Xue Y (2014) Droid-sec: deep learning in android malware detection. In: ACM SIGCOMM computer communication review, vol 44, no 4. ACM, pp 371–372
Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. Icml 1:577–584
Khubeb SM, Shams N (2013) Analysis of KDD CUP 99 dataset using clustering based data mining. Int J Database Theory Appl 6(5):23–34
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on Machine learning. ACM, p 11
Dash M, Choi K, Scheuermann P, Liu H (2002) Feature selection for clustering-a filter solution. In: 2002 IEEE international conference on data mining, Proceedings. IEEE, pp 115–122
Kriegel H-P, Peer K, Jörg S, Arthur Z (2011) Density-based clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(3):231–240
Sigdel M, Dinş İ, Dinş S, Sigdel MS, Pusey ML, Aygün RS (2014) Evaluation of semi-supervised learning for classification of protein crystallization imagery. In: IEEE SOUTHEASTCON 2014. IEEE, pp 1–6
Kumar L, Sanjay M, Santanu KR (2017) An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes. In: Computer standards & interfaces, vol 53, pp 1–32
Kumar L, Rath SK, Sureka A (2017) Empirical analysis on effectiveness of source code metrics for predicting change-proneness. In: Proceedings of the 10th innovations in software engineering conference. ACM, pp 4–14
Blair DC (1979) Information retrieval, CJ Van Rijsbergen. London: Butterworths; 1979: 208 pp. J Am Soc Inf Sci 30(6):374–375
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Faruki P, Ganmoor V, Laxmi V, Gaur MS, Bharmal A (2013) AndroSimilar: robust statistical feature signature for Android malware detection. In: Proceedings of the 6th international conference on security of information and networks. ACM, pp 152–159
Lindorfer M, Neugschwandtner M, Weichselbaum L, Fratantonio Y, Van Der VV, Platzer C (2014) Andrubis–1,000,000 apps later: a view on current Android malware behaviors. In: 2014 third international workshop on building analysis datasets and gathering experience returns for security (BADGERS). IEEE, pp 3–17
Xu R, Saïdi H, Anderson R (2012) Aurasium: practical policy enforcement for android applications. In: Presented as part of the 21st USENIX security symposium (USENIX Security 12), pp 539–552
Kimberly T, Khan SJ, Fattori A, Cavallaro L (2015) CopperDroid: automatic reconstruction of android malware behaviors. In Ndss
Portokalidis G, Homburg P, Anagnostakis K, Bos H (2010) Paranoid Android: versatile protection for smartphones. In: Proceedings of the 26th annual computer security applications conference. ACM, pp 347–356
Hou S, Ye Y, Song Y, Abdulhayoglu M (2017) Hindroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1507–1515
Billah KEM, Mourad D, Abdelouahid D, Djedjiga M (2018) MalDozer: automatic framework for android malware detection using deep learning. Digit Investig 24:S48–S59
Hui-Juan Z, Tong-Hai J, Bo M, Zhu-Hong Y, Wei-Lei S, Li C (2018) HEMD: a highly efficient random forest-based malware detection framework for Android. Neural Comput Appl 30(11):3353–3361
Hui-Juan Z, Zhu-Hong Y, Ze-Xuan Z, Wei-Lei S, Xing C, Li C (2018) DroidDet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272:638–646
Wei W, Mengxue Z, Jigang W (2019) Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Hum Comput 10(8):3035–3043
Han W, Xue J, Wang Y, Liu Z, Kong Z (2019) MalInsight: a systematic profiling based malware detection framework. J Netw Comput Appl 125:236–250
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mahindru, A., Sangal, A.L. MLDroid—framework for Android malware detection using machine learning techniques. Neural Comput & Applic 33, 5183–5240 (2021). https://doi.org/10.1007/s00521-020-05309-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05309-4