Discovering optimal features using static analysis and a genetic search based method for Android malware detection

  • Ahmad FirdausEmail author
  • Nor Badrul AnuarEmail author
  • Ahmad Karim
  • Mohd Faizal Ab Razak


Mobile device manufacturers are rapidly producing miscellaneous Android versions worldwide. Simultaneously, cyber criminals are executing malicious actions, such as tracking user activities, stealing personal data, and committing bank fraud. These criminals gain numerous benefits as too many people use Android for their daily routines, including important communi-cations. With this in mind, security practitioners have conducted static and dynamic analyses to identify malware. This study used static analysis because of its overall code coverage, low resource consumption, and rapid processing. However, static analysis requires a minimum number of features to efficiently classify malware. Therefore, we used genetic search (GS), which is a search based on a genetic algorithm (GA), to select the features among 106 strings. To evaluate the best features determined by GS, we used five machine learning classifiers, namely, Naïve Bayes (NB), functional trees (FT), J48, random forest (RF), and multilayer perceptron (MLP). Among these classifiers, FT gave the highest accuracy (95%) and true positive rate (TPR) (96.7%) with the use of only six features.

Key words

Genetic algorithm Static analysis Android Malware Machine learning 

CLC number



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aafer Y, Du WL, Yin H, 2013. Droidapiminer: mining API-level features for robust malware detection in Android. Proc 9th Int ICST Conf on Security and Privacy in Communication Networks, p.86–103.CrossRefGoogle Scholar
  2. Adewole KS, Anuar NB, Kamsin A, et al., 2017. Malicious accounts: dark of the social networks. J Netw Comput Appl, 79:41–67. CrossRefGoogle Scholar
  3. Afifi F, Anuar NB, Shamshirband S, et al., 2016. Dyhap: dynamic hybrid ANFIS-PSO approach for predicting mobile malware. PLoS ONE, 11(9):e0162627. CrossRefGoogle Scholar
  4. Android Developers, 2015. Android security overview. Android. [Accessed on Sept. 1, 2015].Google Scholar
  5. Anuar NB, Sallehudin H, Gani A, et al., 2008. Identifying false alarm for network intrusion detection system using hybrid data mining and decision tree. Malays J Comput Sci, 21(2):101–115.CrossRefGoogle Scholar
  6. Anuar NB, Papadaki M, Furnell S, et al., 2013. Incident prioritisation using analytic hierarchy process (AHP): risk index model (RIM). Secur Commun Netw, 6(9):1087–1116. CrossRefGoogle Scholar
  7. Apvrille A, Strazzere T, 2012. Reducing the window of opportunity for Android malware gotta catch’ em all. J Comput Virol, 8(1-2):61–71. CrossRefGoogle Scholar
  8. Arp D, Spreitzenbarth M, Malte H, et al., 2014. Drebin: effective and explainable detection of Android malware in your pocket. Proc Symp on Network and Distributed System Security, p.1–15.Google Scholar
  9. Arzt S, Rasthofer S, Fritz C, et al., 2014. Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android Apps. Proc 35th ACM SIGPLAN Conf on Programming Language Design and Implementation, p.259–269. Google Scholar
  10. Aung Z, Zaw W, 2013. Permission-based Android malware detection. Int J Sci Technol Res, 2(3):228–234.Google Scholar
  11. Bartel A, Klein J, Le Traon Y, et al., 2012. Automatically securing permission-based software by reducing the attack surface: an application to Android. Proc 27th IEEE/ACM Int Conf on Automated Software Engineering, p.274–277. Google Scholar
  12. Bird S, Klein E, Loper E, 2009. Natural language processing with Python—analyzing text with the natural language toolkit. O’Reilly Media.Google Scholar
  13. Burguera I, Zurutuza U, Nadjm-Tehrani S, 2011. Crowdroid: behavior-based malware detection system for Android. Proc 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, p.15–26. Google Scholar
  14. Caruana R, Karampatziakis N, Yessenalina A, 2008. An empirical evaluation of supervised learning in high dimensions. Proc 25th Int Conf on Machine Learning, p.96–103. Google Scholar
  15. Chan PPK, Song WK, 2014. Static detection of Android malware by using permissions and API calls. Proc Int Conf on Machine Learning and Cybernetics, p.82–87. Google Scholar
  16. Chang TK, Hwang GH, 2007. The design and implementation of an application program interface for securing XML documents. J Syst Softw, 80(8):1362–1374. CrossRefGoogle Scholar
  17. Chess B, McGraw G, 2004. Static analysis for security. IEEE Secur Priv, 2(6):76–79. CrossRefGoogle Scholar
  18. Deshotels L, Notani V, Lakhotia A, 2014. Droidlegacy: automated familial classification of Android malware. Proc ACM SIGPLAN on Program Protection and Reverse Engineering Workshop, Article 3. Google Scholar
  19. Desnos A, 2015. Androguard. [Accessed on June 29, 2015].Google Scholar
  20. Díaz-Uriarte R, de Andrés SA, 2006. Gene selection and classification of microarray data using random forest. BMC Bioinform, 7:3. CrossRefGoogle Scholar
  21. eBay, 2016. Online shopping. [Accessed on Apr. 4, 2016].Google Scholar
  22. Faruki P, Ganmoor V, Laxmi V, et al., 2013. AndroSimilar: robust statistical feature signature for Android malware detection. Proc 6th Int Conf on Security of Information and Networks, p.152–159. Google Scholar
  23. Feizollah A, Anuar NB, Salleh R, et al., 2013a. A study of machine learning classifiers for anomaly-based mobile botnet detection. Malays J Comput Sci, 26(4):251–265.Google Scholar
  24. Feizollah A, Shamshirband S, Anuar NB, et al., 2013b. Anomaly detection using cooperative fuzzy logic controller. Proc 16th FIRA RoboWorld Congress, p.220–231. Google Scholar
  25. Feizollah A, Anuar NB, Salleh R, et al., 2015. A review on feature selection in mobile malware detection. Dig Invest, 13:22–37. CrossRefGoogle Scholar
  26. Feizollah A, Anuar NB, Salleh R, et al., 2017. Androdialysis: analysis of Android intent effectiveness in malware detection. Comput Secur, 65:121–134. CrossRefGoogle Scholar
  27. Feng Y, Anand S, Dillig I, et al., 2014. Apposcopy: semantics-based detection of Android malware through static analysis. Proc 22nd ACM SIGSOFT Int Symp on Foundations of Software Engineering, p.576–587. Google Scholar
  28. Firdaus A, Anuar NB, 2015. Root-exploit malware detection using static analysis and machine learning. Proc 4th Int Conf on Computer Science and Computational Mathematics, p.177–183.Google Scholar
  29. Frank E, Hall MA, Witten IH, 2016. The WEKA Workbench (4th Ed.). Morgan Kaufmann. Google Scholar
  30. Fröhlich H, Chapelle O, Schölkopf B, 2003. Feature selection for support vector machines by means of genetic algorithm. Proc 15th IEEE Int Conf on Tools with Artificial Intelligence, p.142–148. CrossRefGoogle Scholar
  31. Gascon H, Yamaguchi F, Arp D, et al., 2013. Structural detection of Android malware using embedded call graphs. Proc ACM Workshop on Artificial Intelligence and Security, p.45–54. Google Scholar
  32. Goldberg DE, Holland JH, 1988. Genetic algorithms and machine learning. Mach Learn, 3(2-3):95–99. CrossRefGoogle Scholar
  33. Google, 2014. Google play store. [Accessed on Jan. 1, 2014].Google Scholar
  34. Gordon MI, Kim D, Perkins J, et al., 2015. Information-flow analysis of Android applications in droidSafe. Proc Network and Distributed System Security Symp, p.8–11.Google Scholar
  35. Grace M, Zhou YJ, Wang Z, et al., 2012a. Systematic detection of capability leaks in stock Android smartphones. Proc 19th Network and Distributed System Security Symp, p.1–15.Google Scholar
  36. Grace M, Zhou W, Jiang XX,et al., 2012b. Unsafe exposure analysis of mobile in-app advertisements. Proc 5th ACM Conf on Security and Privacy in Wireless and Mobile Networks, p.101–112. Google Scholar
  37. Grace M, Zhou YJ, Zhang Q, et al., 2012c. RiskRanker: scalable and accurate zero-day Android malware detection. Proc 10th Int Conf on Mobile Systems, Applications, and Services, p.281–294. Google Scholar
  38. Hall M, Frank E, Holmes G, et al., 2009. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl, 11(1):10–18. CrossRefGoogle Scholar
  39. Huang CY, Tsai YT, Hsu CH, 2013. Performance evaluation on permission-based detection for Android malware. Proc Int Computer Symp, p.111–120. Google Scholar
  40. Huang JJ, Zhang XY, Tan L, et al., 2014. AsDroid: detecting stealthy behaviors in Android applications by user interface and program behavior contradiction. Proc 36th Int Conf on Software Engineering, p.1036–1046. Google Scholar
  41. Ikinci A, Holz T, Freiling F, 2008. Monkey-spider: detecting malicious websites with low-interaction honeyclients. Proc Sicherheit-Schutz und Zuverlässigkeit, p.407–421.Google Scholar
  42. Junaid M, Liu DG, Kung D, 2016. Dexteroid: detecting malicious behaviors in Android apps using reverse- engineered life cycle models. Comput Secur, 59:92–117. CrossRefGoogle Scholar
  43. Kang H, Jang JW, Mohaisen A, et al., 2015. Detecting and classifying Android malware using static analysis along with creator information. Int J Distr Sens Netw, 11(6), Article 7.
  44. Karim A, Salleh RB, Shiraz M, et al., 2014. Botnet detection techniques: review, future trends, and issues. J Zhejiang Univ Sci-C (Comput & Elcetron), 15(11):943–983. CrossRefGoogle Scholar
  45. Karim A, Salleh R, Khan MK, 2016. Smartbot: a behavioral analysis framework augmented with machine learning to identify mobile botnet applications. PLoS ONE, 11(3):e0150077. CrossRefGoogle Scholar
  46. Khatavakhotan AS, Ow SH, 2015. Development of a software risk management model using unique features of a proposed audit component. Malays J Comput Sci, 28(2):110–131.Google Scholar
  47. Komili O, 2015. Sophos detects 100% of Android malware in independent test—for the sixth time in a row. [Accessed on Jan. 1, 2016].Google Scholar
  48. Kotsiantis SB, 2013. Decision trees: a recent overview. Artif Intell Rev, 39(4):261–283. CrossRefGoogle Scholar
  49. Kotsiantis SB, Zaharakis ID, Pintelas PE, 2006. Machine learning: a review of classification and combining techniques. Artif Intell Rev, 26(3):159–190. CrossRefGoogle Scholar
  50. La Delfa GC, Monteleone S, Catania V, et al., 2016. Performance analysis of visualmarkers for indoor navigation systems. Front Inform Technol Electron Eng, 17(8):730–740. CrossRefGoogle Scholar
  51. Lai HJ, Tang Y, Luo HX,et al., 2011. Greedy feature selection for ranking. Proc 15th Int Conf on Computer Supported Cooperative Work in Design, p.42–46. Google Scholar
  52. Lee J, Lee S, Lee H, 2015. Screening smartphone applications using malware family signatures. Comput Secur, 52:234–249. CrossRefGoogle Scholar
  53. Lee SH, Jin SH, 2013. Warning system for detecting malicious applications on Android system. Int J Comput Commun Eng, 2(3):324–327. CrossRefGoogle Scholar
  54. Liang SY, Keep AW, Might M, et al., 2013. Sound and precise malware analysis for Android via pushdown reachability and entry-point saturation. Proc 3th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices, p.21–32. CrossRefGoogle Scholar
  55. Lippmann R, 1987. An introduction to computing with neural nets. IEEE ASSP Mag, 4(2):4–22. CrossRefGoogle Scholar
  56. Lu L, Li ZC, Wu ZY,et al., 2012. CHEX: statically vetting Android apps for component hijacking vulnerabilities. Proc ACM Conf on Computer and Communications Security, p.229–240. Google Scholar
  57. Middlemiss MJ, Dick G, 2003. Weighted feature extraction using a genetic algorithm for intrusion detection. Proc Congress on Evolutionary Computation, p.1669–1675. Google Scholar
  58. Narudin FA, Feizollah A, Anuar NB,et al., 2016. Evaluation of machine learning classifiers for mobile malware detection. Soft Comput, 20(1):343–357. CrossRefGoogle Scholar
  59. Peiravian N, Zhu XQ, 2013. Machine learning for Android malware detection using permission and API calls. Proc 25th Int Conf on Tools with Artificial Intelligence, p.300–305. Google Scholar
  60. Peng H, Gates C, Sarma B, et al., 2012. Using probabilistic generative models for ranking risks of Android apps. Proc ACM Conf on Computer and Communications Security, p.241–252. Google Scholar
  61. Punch WFIII, Goodman ED, Pei M, et al., 1993. Further research on feature selection and classification using genetic algorithms. Proc 5th Int Conf on Genetic Algorithms, p.557–564.Google Scholar
  62. Rasthofer S, Arzt S, Bodden E, 2014. A machine-learning approach for classifying and categorizing Android sources and sinks. Proc Network and Distributed System Security Symp, p.1–15.Google Scholar
  63. Razak MFA, Anuar NB, Salleh R, et al., 2016. The rise of “malware”: bibliometric analysis of malware study. J Netw Comput Appl, 75:58–76. CrossRefGoogle Scholar
  64. Russon MA, 2016. Android malware discovered on Google Play has infected millions of users with spyware. [Accessed on June 13, 2016].Google Scholar
  65. Sahs J, Khan L, 2012. A machine learning approach to Android malware detection. Proc European Intelligence and Security Informatics Conf, p.141–147. Google Scholar
  66. Samra AAA, Yim K, Ghanem OA, 2013. Analysis of clustering technique in Android malware detection. Proc 7th Int Conf on Innovative Mobile and Internet Services in Ubiquitous Computing, p.729–733. Google Scholar
  67. Sanz B, Santos I, Laorden C, et al., 2013a. PUMA: permission usage to detect malware in Android. Int Joint Conf CISIS’12-ICEUTE’12-SOCO’12 Special Sessions. Springer Berlin Heidelberg, p.289–298.CrossRefGoogle Scholar
  68. Sanz B, Santos I, Laorden C, et al., 2013b. Mama: manifest analysis for malware detection in Android. Cybern Syst, 44(6-7):469–488. CrossRefGoogle Scholar
  69. Sarip AG, Hafez MB, Daud MN, 2016. Application of fuzzy regression model for real estate price prediction. Malays J Comput Sci, 29(1):15–27. CrossRefGoogle Scholar
  70. Sarma BP, Li NH, Gates C, et al., 2012. Android permissions: a perspective combining risks and benefits. Proc 17th ACM Symp on Access Control Models and Technologies, p.13–22. Google Scholar
  71. Schmidt AD, Bye R, Schmidt HG, et al., 2009a. Static analysis of executables for collaborative malware detection on Android. Proc IEEE Int Conf on Communications, p.1–5. Google Scholar
  72. Schmidt AD, Schmidt HG, Batyuk L, et al., 2009b. Smartphone malware evolution revisited: Android next target? Proc 4th Int Conf on Malicious and Unwanted Software, p.1–7. Google Scholar
  73. Schneider J, 2016. Cross validation. [Accessed on Aug. 1, 2016].Google Scholar
  74. Seo SH, Gupta A, Mohamed Sallam A, et al., 2014. Detecting mobile malware threats to homeland security through static analysis. J Netw Comput Appl, 38:43–53. CrossRefGoogle Scholar
  75. Shabtai A, Fledel Y, Elovici Y, 2010. Automated static code analysis for classifying Android applications using machine learning. Proc Int Conf on Computational Intelligence and Security, p.329–333. Google Scholar
  76. Shabtai A, Kanonov U, Elovici Y, et al., 2012. “Andromaly”: a behavioral malware detection framework for Android devices. J Intell Inform Syst, 38(1):161–190. CrossRefGoogle Scholar
  77. Sharif M, Yegneswaran V, Saidi H, et al., 2008. Eureka: a framework for enabling static malware analysis. Proc 13th Symp on Research in Computer Security, p.481–500. Google Scholar
  78. Sheen S, Anitha R, Natarajan V, 2015. Android based malware detection using a multifeature collaborative decision fusion approach. Neurocomputing, 151:905–912. CrossRefGoogle Scholar
  79. Stein G, Chen B, Wu AS, et al., 2005. Decision tree classifier for network intrusion detection with GA-based feature selection. Proc 43rd Annual Southeast Regional Conf, p.136–141. Google Scholar
  80. Suarez-Tangil G, Tapiador JE, Peris-Lopez P, et al., 2014. Dendroid: a text mining approach to analyzing and classifying code structures in Android malware families. Expert Syst Appl, 41(4):1104–1117. CrossRefGoogle Scholar
  81. Talha KA, Alper DI, Aydin C, 2015. Apk auditor: permission-based Android malware detection system. Dig Invest, 13:1–14. CrossRefGoogle Scholar
  82. Thomas P, 2015. Google’s Android operating system dominates the smartphone market. [Accessed on June 11, 2016].Google Scholar
  83. Tropp JA, 2004. Greed is good: algorithmic results for sparse approximation. IEEE Trans Inform Theory, 50(10): 2231–2242. MathSciNetzbMATHCrossRefGoogle Scholar
  84. Walenstein A, Deshotels L, Lakhotia A, 2012. Program structure-based feature selection for Android malware analysis. Proc 4th Int Conf on Security and Privacy in Mobile Information and Communication Systems, p.51–52. CrossRefGoogle Scholar
  85. Williams G, 2010. ARFF data. [Accessed on Sept. 10, 2015].Google Scholar
  86. Wu DJ, Mao CH, Wei TE, et al., 2012. Droidmat: Android malware detection through manifest and API calls tracing. Proc 7th Asia Joint Conf on Information Security, p.62–69. Google Scholar
  87. Yang ZM, Yang M, 2012. LeakMiner: detect information leakage on Android with static taint analysis. Proc 3rd World Congress on Software Engineering, p.101–104. Google Scholar
  88. Yerima SY, Sezer S, McWilliams G, et al., 2013. A new Android malware detection approach using Bayesian classification. Proc IEEE 27th Int Conf on Advanced Information Networking and Applications, p.121–128. Google Scholar
  89. Yerima SY, Sezer S, McWilliams G, 2014a. Analysis of Bayesian classification-based approaches for Android malware detection. IET Inform Secur, 8(1):25–36. CrossRefGoogle Scholar
  90. Yerima SY, Sezer S, Muttik I, 2014b. Android malware detection using parallel machine learning classifiers. Proc 8th Int Conf on Next Generation Mobile Apps, Services and Technologies, p.37–42. Google Scholar
  91. Yerima SY, Sezer S, Muttik I, 2015. High accuracy Android malware detection using ensemble learning. IET Inform Secur, 9(6):313–320. CrossRefGoogle Scholar
  92. Yu L, Pan ZL, Liu JJ, et al., 2013. Android malware detection technology based on improved Bayesian classification. Proc 23rd Int Conf on Instrumentation, Measurement, Computer, Communication and Control, p.1338–1341. Google Scholar
  93. Zhang LS, Niu Y, Wu X, et al., 2013. A3: automatic analysis of Android malware. Proc 1st Int Workshop on Cloud Computing and Information Security, p.89–93. Google Scholar
  94. Zhang T, 2009. On the consistency of feature selection using greedy least squares regression. J Mach Learn Res, 10:555–568.MathSciNetzbMATHGoogle Scholar
  95. Zhou W, Zhou YJ, Jiang XX,et al., 2012. Detecting repackaged smartphone applications in third-party Android marketplaces. Proc 2nd ACM Conf on Data and Application Security and Privacy, p.317–326. Google Scholar
  96. Zhou W, Zhou YJ, Grace M, et al., 2013. Fast, scalable detection of “Piggybacked” mobile applications. Proc 2nd ACM Conf on Data and Application Security and Privacy, p.185–196. Google Scholar
  97. Zia T, Akhter MP, Abbas Q, 2015. Comparative study of feature selection approaches for Urdu text categorization. Malays J Comput Sci, 28(2):93–109.Google Scholar

Copyright information

© Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer System and TechnologyUniversity of MalayaKuala LumpurMalaysia
  2. 2.Faculty of Computer System & Software EngineeringUniversity Malaysia PahangGambangMalaysia
  3. 3.Department of Information TechnologyBahauddin Zakariya UniversityMultanPakistan

Personalised recommendations